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(57) Abstract 

A method for producing one of the following proteins in transgenic monocot plant cells is disclosed: (i) mature, glycosylated 
» t-antitrypsin (AAT) having the same N-terminal amino acid sequence as mature AAT produced in humans and a glycosylation pattern 
which increases scrum halflife substantially over that of mature non-glycosylated AAT; (ii) mature, glycosylated antilhrombin III (ATIII) 
having the same N-terminal amino acid sequence as mature ATIII produced in humans; (iii) mature human scrum albumin (HSA) having 
the same N-terminal amino acid sequence as mature HSA produced In humans and having the folding pallcm of native mature HSA as 
evidenced by its bilirubin-binding characteristics; and (iv) mature, active siibtilisin BPN' (BPN') having the same N-tenninal amino acid 
sequence as BPN' produced in Bacillus. Monocot plants cells are transformed with a chimeric gene which includes a DNA coding sequence 
encoding a fusion protein having an (i) N-terminal moiety conesponding to a rice a-amy lase si j^al s equence peptide and. (iii) immediately 
adjacent the C-terminal amino acid of said peptide, a protein moiety corrcsponflfflg' W rnei?iafuiS*pfblcin' to be produced. 
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Production of Mature Proteins in Plants 



Field of the Invention 

The present invention relates to the production of mature proteins in plant cells> and in 
particular, to the production of proteins in mature secreted form. 

5 

Background of the Invention 

A major commercial focus of biotechnology is the recombinant production of proteins, 
including both industrial enzymes and proteins that have important therapeutic uses. 

Therapeutic proteins are commonly produced recombinantly by microbial expression 

10 systems, such as in £, coli and the yeast system S. cerevisiae. To date, the cost of recombinant 
proteins produced in a microbial host has limited the availability of a variety of therapeutically 
important proteins, such as human serum albumin (HSA) and ocr^ntitrypsin (AAT), to the extent 
that the proteins are in short supply. 

Some therapeutic proteins appear to rely on glycosylation for optimal activity or stability, 

15 and the general inability of microbial systems to glycosylate or properly glycosylate mammalian 
proteins has also limited the usefulness of these recombinant expression systems. In some cases, 
proper protein folding cannot take place, because of the need for manunalian-specific foldases or 
other folding conditions. 

To some extent, protein expression in cultured mammalian cells, or in transgenic animals 

20 may overcome the limitations of microbial expression systems. However, the cost per weight ratio 
of the protein is still high in manunalian expression systems, and the risk of protein contamination 
by manunalian viruses may be a significant regulatory problem. Protein production by transgenic 
animals also carries the risk of genetic variation from one generation to another. The attendant risk 
is variation in the recombinant protein produced, for example, variation in protein processing to ^ 

25 yield a nature active protein with different N-terminal residue. yj 
It would therefore be desirable to produce seleaed therapeutic and industrial proteins in a qq 
protein expression system that largely overcomes problems associated with microbial and ^ 
mammalian-cell systems. In particular, production of the proteins should allow large volume ^ 
production at low cost, and yield properly processed and glycosylated proteins. The production ^ 

30 system should also have a relatively stable genotype from generation to generation. These aims are ^ 
achieved, in the present invention, for the therapeutic proteins AAT, HSA, and antithrombin III ^ 
(ATIII), and the industrial enzyme subtilisin BPN\ 
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Human rr t -antitrvpsin 

Human a,-antitrypsin (AAT) is a monomer with a molecular weight of about 52Kd. 
Normal AAT contains 394 residues, with three complex oligosaccharide units exposed to the surface 
5 of the 

molecule^ linked to asparagines 46, 83, and 247 (Carroll, P., et al, Nature (1982) 298:329). 

AAT is the major plasma proteinase inhibitor whose primary function is to control the 
proteolytic activity of trypsin, elastase, and chymotrypsin in plasma. In particular, the protein is a 
potent inhibitor of neutrophil elastase, and a deficiency of AAT has been observed in a number of 
10 patients with chronic emphysema of the lungs. A proportion of individuals with serum deficiency 
of AAT may progress to cinhosis and liver failure {e,g,, Wu, Y., et al, BioEssays Jl(4):163 
(1991). 

Because of the key role of AAT as an elastase inhibitor, and because of the prevalence of 

genetic diseases resulting in deficient ^erum levels of AAT, there has been. an active interest in 
15 recombinant synthesis of AAT, for human therapeutic use. To date, this approach has not been 

satisfactory for AAT produced by recombinant methods, for the reasons discussed above. 

Human Antithromhin III 

Antithrombin III (ATIII) is the major inhibitor of thrombin and factor Xa, and to a lesser 

extent, other serine proteases generated during the coagulation process, e,g., factors IXa, XIa, and 
20 Xlla. The inhibitory effect of ATIII is accelerated dramatically by heparin. In patients with a 

history of deep vein thrombosis and pulmonary embolism, the prevalence of ATIII deficiency is 2- 

3%. 

ATIII protein has been useful in treating hereditary ATIII deficiency and has wide clinical 
applications for the prevention of thrombosis in high risk situations, such as surgery and delivery, 
25 and for treating acute thrombotic episodes, when used in combination with heparin. 

ATIII is a glycoprotein with a molecular weight of 58,200, having 432 amino acids and 
containing three disulfide linkages and four asparagine-linked biantennary carbohydrate chains. 
Because of the key role of ATIII as an anti-thrombotic agent, and because of the broad clinical 
potential in anti-thrombosis therapy, there has been an active interest in recombinant synthesis of 
30 ATIII, for human therapeutic use. To date, this approach has not been satisfactory for ATIII j 
produced by microbial or mammalian recombinant methods, for the reasons discussed above. ^ 
Human Serum Albumin ^ 
Serum albumin is the main protein component of plasma. Its main function is regulation of (/} 
colloidal osmotic pressure in the bloodstream. Serum albumin binds numerous ions and small ^ 
35 molecules, including Ca2*, Na*, K*, fatty acids, hormones, bilirubin and cenain drugs. 
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«w»«.,i aioumm ^niA;> IS expfcssed as a 609 amino acid prepro-protein which is 

further processed by removal of an amino-terminal peptide and an additional six amino acid residues 
to form the mature protein. The mature protein found in human serum is a monomeric, 
unglycosylated protein 585 amino acids in length (66 kDal), with a globular structure maintained by 

5 17 disulfide bonds. The pattern of disulfide links forms a structural unit of one small and two large 
disulfide-linked double loops (Geisow, MJ. a qL (1977) Biochem. J. 163:477-484) which forms a 
high-affinity bilirubin binding site. 

HSA is used to expand blood volume and raise low blood protein levels in cases of shock, 
trauma, and post-surgical recovery. HSA is often administered in emergency situations to stabilize 

10 blood pressure. 

Because of the key role of HSA as an osmotic stabilizing agent, and because of its broad 
clinical potential in, e.g., plasma replacement therapy, there has been an active interest in 
recombinant synthesis of HSA for human therapeutic use. This approach has not been satisfactory 
for HSA produced by microbial or mammalian recombinant methods, for the reasons discussed 
15 above. 

Subtilisin BPN' 

Subtilisin BPN' (BPN') is an important industrial enzyme, particularly for use as a 
detergent enzyme. Several groups have reported amino acid substitution modifications of the 
enzyme that are effective in enhancing the activity, pH optimum, stability and/or therapeutic use of 
20 the enzyme. 

BPN' is expressed in as a 381 amino acid preproenzyme, including 35 amino acid sequence 
required for secretion and a 77 amino acid moiety which serves as a chaperon to facilitate folding. 
Studies indicate that the pro moiety acts in trans outside of cells. 

To date, large-scale production of BPN' is predominantly by microbial fermentation, which 
25 has relatively high costs associated with it. In addition, the enzyme tends to auto-degrade at optimal 
fermentation growth-medium conditions. 

Summary of the Invention qq 

In one aspect, the invention includes a method of producing, in monocot plant cells, a j 

30 mature heterologous protein selected ft-om the group consisting of (i) mature, glycosylated ar ^ 

antitrypsin (AAT) having the same N-terminal amino acid sequence as mature AAT produced in ^ 

humans and a glycosylation pattern which increases serum halflife substantially over that of non- (f) 

UJ 

glycosylated mature AAT; (ii) mature, glycosylated antithrombin III (ATIII) having the same N- 
terminal amino acid sequence as mature ATIII produced in humans; (iii) mature human serum 
35 albumin (HSA) having the same N-terminal amino acid sequence as mature HSA produced in 
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£v<uiii5 puixcrn ui iidvive mature HbA as evidenced by its bilirubin-binding 
characteristics; and (iv) mature, active sublilisin BPN' (BPN*), glycosylated or non-giycosylated, 
having the same N-terminal amino acid sequence as BPN' produced in Bacillus, 

The method includes obtaining monocot cells transformed with a chimeric gene having (i) a 

5 monocot transcriptional regulatory region, inducible by addition or removal of a small molecule, or 
during seed maturation, (ii) a first DNA sequence encoding the heterologous protein, and (iii) a 
second DNA sequence encoding a signal peptide. The second DNA sequence is operably linked to 
the transcriptional regulatory region and to the first DNA sequence. The first DNA sequence is in 
translation-frame with the second DNA sequence, and the two sequences encode a fusion protein. 

10 The transformed cells are cultivated under conditions effective to induce the transcriptional 
regulatory region, thereby promoting expression of the fusion protein and secretion of the mature 
heterologous protein from the transformed cells. The mature heterologous protein produced by the 
transformed cells is then isolated. 

In one embodiment of the method, the first DNA sequence encodes- pro-subtil isin BPN* 

15 (proBPN'), the cultivating includes cultivating the transformed cells at a pH between 5 and 6, and 
the isolating step includes incubating the proBPN' to under condition effective to allow its 
autoconversion to active mature BPN*. In another embodiment, the first DNA sequence encodes 
mature BPN', and the cells are transformed wifli a second chimeric gene containing (i) a transcript- 
ional regulatory region inducible by addition or removal of a small molecule, (ii) a third DNA 

20 sequence encoding the pro-peptide moiety of BPN*, and (iii) a fourth DNA sequence encoding a 
signal polypeptide. The fourth DNA sequence is operably linked to the transcriptional regulatory 
region and to the third DNA sequence, and the signal polypeptide is in translation-frame with the 
pro-peptide moiety and is effective to facilitate secretion of expressed pro-peptide moiety from the 
transformed cells. The cultivating step includes cultivating the transformed cells at a pH between 5 ^ 

25 and 6, and the isolating step includes incubating the mature BPN' and the pro-moiety under flU 
conditions effective to allow the conversion of BPN' by the pro- moiety to active mature BPN*. O 
In another embodiment of the method, the signal peptide is the RAmy3D signal peptide Uj 
(SEQ ID NO: I) or the RAmylA signal peptide (SEQ ID N0:4). The coding sequence of the signal ^ 
peptide may be a codon-optimized sequence, such as the codon-optimized RAmy3D sequence ss 

30 identified as SEQ ID N0:3. The first DNA sequence may also be codon-optimized. Exemplary > 



codon-optimized signal peptide-heterologous protein fusion protein coding sequences include 3D- 



< 



AAT (SEQ ID NO: 18), 3D-ATni (SEQ ID NO: 19), and 3D-HSA (SEQ ID NO:20). The first [g 
DNA sequence may further contain codon substitutions which eliminate one or more potential ^ 
glycosylation sites present in the native amino acid sequence of the heterologous protein, such as the 
35 codon-optimized sequence encoding 3D-proBPN* (SEQ ID N0:21). 
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udiibcnptionai regulatory region may be a 
promoter derived from a rice or barley ^-amylase gene, including RAmylA, RAmylB, RAmy2A, 
RAmy3A, RAmy3B, RAmy3C, RAmy3D, RAmy3E, pM/C, gKAmyUl, gKAmyl55, Amy32b, or 
HV18. The chimeric gene may further include, between the transcriptional regulatory region and the 

5 fusion protein coding sequence, the 5* untranslated region (5* UTR) of an inducible monocot gene 
such as one of the rice or barley (^-amylase genes described above. One preferred 5' UTR is that 
from the RAmylA gene, which is effective to enhance the stability of the gene transcript. The 
chimeric gene may further include, downstream of the coding sequence, the 3' untranslated region 
(3' UTR) from an inducible monocot gene, such as one of the rice or barley a-aniylase genes 

10 mentioned above. One preferred 3' UTR is from the RAmylA gene. 

Where the method is employed in protein production in a monocot cell culture, preferred 
promoters are the RAmy3D and RAmy3E gene promoters, which are upregulated by sugar 
depletion in cell culture. Where the gene is employed in protein production in germinating seeds, a 
preferred promoter is the RAmylA gene promoter, which is upregulated by gibberellic acid during 

15 seed germination. Where gene is upregulated during seed maturation, a preferred promoter is the 

barley endosperm-specific Bl-hordein promoter. 

The invention also includes a mature heterologous protein produced by the above method. 

The protein has a glycosylation pattern characteristic of the monocot plant in which the protein is 
produced. The glycosyated protein is selected from the group consisting of (i) mature glycosylated 

20 (xi-antitrypsin (AAT) having the same N-terminal amino acid sequence as mature AAT produced in 
humans and having a glycosylation pattern which increases serum halflife substantially over that of 
non-glycosylated manire AAT; (ii) mature glycosylated antithrombin III (ATIII) having the same N- 
terminal amino acid sequence as mature ATIII produced in humans; and (iii) mature glycosylated 
subtilisin BPN* (BPN*) having the same N-termirial amino acid sequence as BPN' produced in 

25 Bacillus. U. 

The invention also includes plant cells and seeds capable of producing the mature Q 
heterologous proteins according to the above method. * III 
These and other objects and features of the invention will be more fiilly understood when CD 
the following detailed description of the invention is read in conjunction with the accompanying ^ 

30 drawings. > 



Brief Description of the Figures 



< 



Fig. 1 shows, in the lower row, the amino acid sequence of a RAmy3D signal sequence ^ 
portion employed in the invention, identified as SEQ ID N0:1; in the middle row, the CD 
corresponding native coding sequence, identified as SEQ ID N0:2; and in the upper row, a 
35 corresponding codon-optimized sequence, identified as SEQ ID N0:3; 
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. wv/iupwiiwiii oi a cnimciic gene constructed in accordance with an 

embodiment of the invemion; 

Figs. 3A and 3B illustrate the construction of an exemplary transformation vector for use in 
transforming a monocot plant, for production of a mature protein in cell culture in accordance with 
5 one embodiment of the invention (native mature AAT coding sequence under control of the 
RAmy3D promoter and signal sequence); 

Fig. 4 illustrates factors in the metabolic regulation of AAT production in rice cell culture; 

Fig, 5 shows immunodetection of AAT using antibody raised against the C-terminal region 
of AAT; 

10 Fig. 6 shows Western blot analysis of AAT produced by transformed rice cell lines 18F, 

IIB, and27F; 

Fig. 7 shows the time course of elastase:AAT complex formation in human and rice- 
produced forms of AAT; 

Fig. 8 shows an N-terminal-' sequence for mature at-antitrypsin - (A AT) produced in 
15 accordance with the invention, identified herein as SEQ ID NO:22; 

Fig. 9 shows a Western blot of ATIII produced in accordance with the invention; 
Fig. 10 shows a Western blot of plant-produced BPN\ comparing expression from codon- 
optimized and native coding sequences; 

Fig. 11 compares the specific activity of BPN' codon-optimized (AP106) vs. BPN' native 
20 (APlOl) expression in rice callus cell culture; and 

Fig. 12 shows a western blot of HSA produced in germinating seeds in accordance with the 
invention. 

Brief Description of the Sequences 
25 SEQ ID NO: 1 is the amino acid sequence of the RAmy3D signal peptide; 

SEQ ID N0:2 is the native sequence encoding the RAmy3D signal peptide; 

SEQ ID NO: 3 is a codon-optimized sequence encoding the RAmy3D signal peptide; 

SEQ ID N0:4 is the amino acid sequence of the RAmylA signal peptide; 

SEQ ID N0:5 is the 5* UTR derived from the RAmylA gene; 
30 SEQ ID N0:6 is the 3' UTR derived from the RAmylA gene; 

SEQ ID N0:7 is the amino acid sequence of mature ar^intitrypsin (AAT); 

SEQ ID NO: 8 is the native DNA coding sequence of mature AAT; 

SEQ ID N0:9 is the amino acid sequence of mature antithrombin III (ATIII); 

SEQ ID NO: 10 is the native DNA coding sequence of mature ATIII; 
35 SEQ ID NO: 11 is the amino acid sequence of mature human serum albumin (HSA); 

6 
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^ - * - wuing or mature HSA; 

SEQ ID NO: 13 is the amino acid sequence of native proBPN'; 
SEQ ID NO: 14 is the native DNA coding sequence of proBPN*; 
SEQ ID N0:15 is the amino acid sequence of the "pro" moiety of BPN*; 
5 SEQ ID NO: 16 is the amino acid sequence of native mature BPN'; 

SEQ ID NO: 17 is the amino acid sequence of a mature BPN* variant in which all potential 
N-glycosylation sites are removed according to Table 2; 

SEQ ID NO: 18 is a codon-optimized sequence encoding the RAmy3D signal 
sequence/mature ar^titrypsin fusion protein; 
10 SEQ ID NO: 19 is a sequence encoding the RAmyBD signal sequence/mature antithrombin 

in fusion protein, with a codon-optimized RAmy3D coding sequence fused to the native mature 
Ann coding sequence; 

SEQ ID NO:20 is a sequence encoding the RAmy3D signal sequence/mature human serum 
albumin fusion protein, with a codon-optimized RAmy3D coding sequence^ fused to the native 
15 mature HSA coding sequence; 

SEQ ID N0:21 is a codon-optimized sequence encoding the JlAmy3D signal 
sequence/prosubtilisin BPN' fusion protein; 

SEQ ID NO:22 is the N-terminal sequence of mature arantitrypsin produced in accordance 
with the invention; 

20 SEQ ID NO:23 is an oligonucleotide used to prepare the intermediate p3DProSig construct 

of Example 1; 

SEQ ID NO:24 is the complement of SEQ ID NO:23; 

SEQ ID NO:25 is an oligonucleotide used to prepare the intermediate p3DProSigENDlink 
construct of Example 1 ; 
25 SEQ ID NO:26 is the complement of SEQ ID NO:25; 

SEQ ID NO:27 is one of six oligonucleotides used to prepare the intermediate plAProSig 
construct of Example 1; 

SEQ ID NO:28 is one of six oligonucleotides used to prepare the intermediate plAProSig qq 
construct of Example 1; ^ 
30 SEQ ID NO:29 is one of six oligonucleotides used to prepare the intermediate plAProSig ^ 

construct of Example 1 ; ^ 
SEQ ID NO:30 is one of six oligonucleotides used to prepare the intermediate plAProSig ^ 
construct of Example 1 ; ffl 
SEQ ID N0:31 is one of six oligonucleotides used to prepare the intermediate plAProSig 
3 5 construct of Example 1 ; 
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_ ^* wiigunuticouacj* U5>ea 10 prepare the intermediate plAProSig 

construct of Example 1 ; 

SEQ ID NO:33 is the N-terminal primer used to PCR-amplify the AAT coding sequence 
according to Example 1; and 
5 SEQ ID NO:34 is the C-terminal primer used to PCR-amplify the AAT coding sequence 

according to Example L 

Detailed Description of the Invention 

I. Definitions : 

10 The terms below have the following meaning, unless indicated otherwise in the 

specification. 

"Cell culture" refers to cells and cell clusters, typically callus cells, growing on or 
suspended in a suitable growth medium. 

"Germination" refers to the breaking of dormancy in a seed and the resumption of metabolic 
15 activity in the seed, including the produaion of enzymes effective to break down starches in the 
seed endosperm. 

"Inducible" means a promoter that is upregulated by the presence or absence of a small 
molecules. It includes both indirect and direct inducement. 

**Inducible during germination" refers to promoters which are substantially silent but not 
20 totally silent prior to germination but are turned on substantially (greater than 25%) during 
germination and development in the seed. Examples of promoters that are inducible during 
germination are presented below. 

"Small molecules", in the context of promoter induction, are typically small organic or 
bioorganic molecules less than about I kDal. Examples of such small molecules include sugars, 
25 sugar-derivatives (including phosphate derivatives), and plant hormones (such as, gibberellic or 
absissic acid). 

"Specifically regulatable" refers to the ability of a small molecule to preferentially affect ^ 

transcription from one promoter or group of promoters {e.g., the a-^ylase gene family), as ^ 

opposed to non-specific effects, such as, enhancement or reduction of global transcription within a ^ 

30 cell by a small molecule. h" 

CO 

"Seed maturation" or "grain development" refers to the period starting with fertilization in |^ 
which metabolizable reserves, e.g., sugars, oligosaccharides, starch, phenolics, amino acids, and 
proteins, are deposited, with and without vacuole targeting, to various tissues in the seed (grain), 
e.g., endosperm, testa, aleurone layer, and scutellar epithelium, leading to grain enlargement, grain 
35 filling, and ending with .grain desiccation. 

8 
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-wtK^.t n^it^ra lo promuicii wmcii are turned on substantially 
(greater than 25%) during seed maturation. 

"Heterologous DNA" or "foreign DNA" refers to DNA which has been introduced into 
plant cells from another source, or which is from a plant source, including the same plant source, 
5 but which is under the control of a promoter or terminator that does not normally regulate 
expression of the heterologous DNA. 

"Heterologous protein" is a protein, including a polypeptide, encoded by a heterologous 
DNA. A "transcription regulatory region" or "promoter" refers to nucleic acid sequences that 
influence and/or promote initiation of transcription. Promoters are typically considered to include 
10 regulatory regions, such as enhancer or inducer elements. 

A "chimeric gene," in the context of the present invention, typically comprises a promoter 
sequence operably linked to DNA sequence that encodes a heterologous gene product, e.g., a 
selectable marker gene or a fusion protein gene. A chimeric gene may also contain further 
transcription regulatory elements, such as transcription termination signals,- as well as translation 
15 regulatory signals, such as, termination codons. 

"Operably linked" refers to components of a chimeric gene or an expression cassette that 
function as a unit to express a heterologous protein. For example, a promoter operably linked to a 
heterologous DNA, which encodes a protein, promotes the production of functional mRNA 
corresponding to the heterologous DNA. 
20 A "product" encoded by a DNA molecule includes, for example, RNA molecules and 

polypeptides. 

"Removal" in the context of a metabolite includes both physical removal as by washing and 
the depletion of the metabolite through the absorption and metabolizing of the metabolite by the 
cells. 

25 "Substantially isolated" is used in several contexts and typically refers to the at least partial 

purification of a protein or polypeptide away from unrelated or contaminating components, ^ 
Methods and procedures for the isolation or purification of proteins or polypeptides are known in 
the art. 



8 
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"Stably transformed" as used herein refers to a cereal cell or plant that has foreign nucleic ^ 
30 acid stably integrated into its genome which is transmitted through multiple generations. | 
"ai-antitrypsin or "AAT" refers to the protease inhibitor which has an amino acid sequence ^ 
substantially identical or homologous to AAT protein identified by SEQ ID N0:7. 

"Antithrombin III" or "ATIII" refers to the heparin-activated inhibitor of thrombin and 
factor Xa, and which has an amino acid sequence substantially identical or homologous to Aim 
35 protein identified by SEQ ID N0i9. 
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muumin oi n:>A teters lo a protein which has an amino acid sequence 
substantially identical or homologous to the mature HSA protein identified by SEQ ID NO: 1 1 . 

"Subtilisin" or "subtilisin BPN*" or -BPN*" refers to the protease enzyme produced 
naturally by B. amyloUquefaciens, and having the sequence of SEQ ID NO: 16, or a sequence 
5 homologous therewith. 

"proBPN*" refers to a form of BPN' having an approximately 78 amino-acid "pro" moiety 
that functions as a chaperon polypeptide to assist in folding and activation of the BPN\ and having 
the sequence in SEQ ID NO: 13, or a sequence homologous therewith. 

"Codon optimization" refers to changes in the coding sequence of a gene to replace native 
10 codons with those corresponding to optimal codons in the host plant, 

A DNA sequence is "derived from" a gene, such as a rice or barley ct-amylase gene, if it 
corresponds in sequence to a segment or region of that gene. Segments of genes which may be 
derived from a gene include the promoter region, the 5' untranslated region, and the 3' untranslated 
region of the gene. 

15 

II. Transformed plant cells 

The plants used in the process of the present invention are derived from monocots, 
particularly the members of the taxonomic family known as the Gramineae. This family includes all 
members of the grass family of which the edible varieties are known as cereals. The cereals include 

20 a wide variety of species such as wheat {Triticum sps.), rice (Oryza sps.) barley (Hordeum sps,) 
oats, (Avena sps.) rye (Secale sps.), corn (Zea sps.) and millet (Pennisettum sps.). In the present 
invention, preferred family members are rice and barley. ^ 
Plant cells or tissues derived from the members of the family are transformed with Q 
expression constructs (Le.j plasmid DNA into which the gene of interest has been inserted) using a 

25 variety of standard techniques {e.g., electroporation, protoplast fusion or microparticle 

bombardment). The expression construct includes a transcription regulatory region (promoter) ^ 
whose transcription is specifically upregulated by the presence of absence of a small molecule, such ^ 
as the reduction or depletion of sugar, e.g., sucrose, in culture medium, or in plant tissues, e.g., 
germinating seeds. In the present invention, particle bombardment is the preferred transformation ^ 

30 procedure. ^ 
The construct also includes a gene encoding a mature heterologous protein in a form 
suitable for secretion from plant cells. The gene encoding the recombinant heterologous protein is 
placed under the control of a metabolically regulated promoter. Metabolically regulated promoters 
are those in which mRNA synthesis or transcription, is repressed or upregulated by a small 

35 metabolite or hormone molecule, such as the rice RAmy3D and RAmy3E promoters, which are 
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regenerated transgenic plants, a preferred promoter is the Ramy lA promoter, which is up-regulated 
by gibberelHc acid during seed germination. The expression construct also utilizes additional 
regulatory DNA sequences e,g.y preferred codons, termination sequences, to promote efficient 
5 translation of A AT, as will be described. 



A. Plant Expression Vector 

Expression vectors for use in the present invention comprise a chimeric gene (or expression 
cassette), designed for operation in plants, with companion sequences upstream and downstream 

10 from the expression cassette. The companion sequences will be of plasmid or viral origin and 
provide necessary characteristics to the vector to permit the vectors to move DNA from bacteria to 
the desired plant host. Suitable transformation vectors are described in related application PCT WO 
95/14099, published May 25, 1995, which is incorporated by reference herein. Suitable 
components of the expression vector, jncluding an inducible promoter, coding sequence for a signal 

IS peptide, coding sequence for a mature heterologous protein, and suitable termination sequences are 
discussed below. One exemplary vector is the p3D(AAT)vI.O vector illustrated in Figs 3A and 3B. 



Al. Promoters 

The transcription regulatory or promoter region is chosen to be regulated in a manner 
20 allowing for induction under selected cultivation conditions, e.g., sugar depletion in culture or 
water uptake followed by gibberelHc acid production in germinating seeds. Suitable promoters, and 
their method of selection are detailed in above-cited PCT application WO 95/14099. Examples of Q» 
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such promoters include those that transcribe the cereal ot-amylase genes and sucrose synthase genes, 
and are repressed or induced by small molecules, like sugars, sugar depletion or phytohormones LU 

25 such as gibberelHc acid or absissic acid. Representative promoters include the promoters from the CD 
rice a-amylase RAmylA, RAmylB, RAmy2A, RAmySA, RAmySB, RAmySC, RAmySD, and J 
RAmy3E genes, and from the pM/C, gKAmyl41, gKAmyl55, Amy32b, and HV18 barley a- ^ 
amylase genes. These promoters are described, for example, in Advances in Plant ^ 
Biotechnology Ryu, D.D.Y., et al, Eds., Elsevier, Amsterdam, 1994, p.37, and references cited CO 

30 therein. Other suitable promoters include the sucrose synthase and sucrose-6-phosphate-synthetase CX) 
(SPS) promoters from rice and barley. 

Other suitable promoters include promoters which are regulated in a manner allowing for 
induction under seed-maturation conditions. Examples of such promoters include those associated 
with the following monocot storage proteins: rice glutelins, oryzins» and prolamines, barley 

35 hordeins, wheat gliadins and glutelins, maize zeins and glutelins, oat glutelins, and sorghum 
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kafirins, millet pennisetins, and rye secalins. 

A preferred promoter for expression in germinating seeds is the rice a-amylase RAmylA 
promoter, which is upregulated by gibberellic acid. Preferred promoters for expression in cell 
culture are the rice a-amylase RAmy3D and RAmySE promoters which are strongly upregulated by 
5 sugar depletion in the culture. These promoters are also active during seed germination. A 
preferred promoter for expression in maturing seeds is the barley endosperm-specific Bl-hordein 
promoter (Brandt, A., et aL, (1985) Carlsberg Res. Commun. 50:333-345). 

The chimeric gene may further include, between the promoter and coding sequences, the 5' 
untranslated region (5' UTR) of an inducible monocot gene, such as the 5* UTR derived from one 
10 of the rice or barley ^-amylase genes mentioned above. One preferred 5* UTR is that derived from 
the RAmylA gene, which is effective to enhance the stability of the gene transcript. This 5' UTR 
has the sequence given by SEQ ID NO:5 herein. 

A2. Signal Sequences 

15 In addition to encoding the protein of interest, the chimeric gene encodes a signal sequence 

(or signal peptide) that allows processing and translocation of the protein, as appropriate. Suitable 
signal sequences are described in above-referenced PCT application WO 95/14099. One preferred 
signal sequence is identified as SEQ ID NO: 1 and is derived from the RAmy3D promoter. Another 
preferred signal sequence is identified as SEQ ID N0:4 and is derived from the RAmylA promoter. 

20 The plant signal sequence is placed in frame with a heterologous nucleic acid encoding a mature 
protein, forming a construct which encodes a fusion protein having an N-terminal region 
corresponding to the signal peptide and, immediately adjacent to the C-terminal amino acid of the 
signal peptide, the N-terminal amino acid of the mature heterologous protein. The expressed fusion 
protein is subsequently secreted and processed by signal peptidase cleavage precisely at the junction 

25 of the signal peptide and the mature protein, to yield the mature heterologous protein. 

In another embodiment of the invention, the coding sequence in the fusion protein gene, in 
at least the coding region for the signal sequence, may be codon-optimized for optimal expression in 
plant cells, e.g.y rice cells, as described below. The upper row in Fig. 1 shows one codon- 
optimized coding sequence for the RAmy3D signal sequence, identified herein as SEQ ID N0:3. 

30 

A3. Naturally-Occurring Heterologous Protein Coding Sequences 
(i) o^ rAntltryp sin: Mature human AAT is composed of 394 amino acids, having the 
sequence identified herein as SEQ ID N0:7. The protein has N-glycosylation sites at asparagines 
46, 83 and 247. The corresponding native DNA coding sequence is identified herein as SEQ ID 
35 N0:8. 
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(ii) Antithrombin III : Mature human ATIII is composed of 432 amino acids, having the 
sequence identified herein as SEQ ID N0:9. The protein has N-glycosylation sites at the four 
asparagine residues 96, 135, 155, and 192. The corresponding native DNA coding sequence is 
identified herein as SEQ ID NO: 10. 
5 (iii) Human serum albumin : Mature HSA as found in human serum is composed of 585 

amino acids, having the sequence identified herein as SEQ ID N0:11. The protein has no N-linked 
glycosylation sites. The corresponding native DNA coding sequence is identified herein as SEQ ID 
NO: 12. 

(iv) Subtilisin BPN' : Native proBPN' as produced in B, amyloliquefaciens is composed of 
10 352 amino acids, having the sequence identified herein as SEQ ID NO: 13, The corresponding native 
DNA coding sequence is identified herein as SEQ ID NO: 14. The proBPN' polypeptide contains a 
77 amino acid "pro" moiety which is identified herein as SEQ ID NO: 15. The remainder of the 
polypeptide, which forms the mature active BPN', is a 275 amino acid sequence identified herein by 
SEQ ID NO: 16. Native BPN' as produced in Bacillus is not glycosylated. 

15 

A4. Codon-Optimized Coding Sequences 

In accordance with one aspect of the invention, it has been discovered that a severalfold 
enhancement of expression level can be achieved in plant cell culture by modifying the native 
coding sequence of a heterologous gene by contain predominantly or exclusively, highest-frequency 

20 codons found in the plant cell host. 

The method will be illustrated for expression of a heterologous gene in rice plant cells, it 
being recognized that the method is generally applicable to any monocot. As a first step, a 
representative set of known coding gene sequence from rice is assembled. The sequences are then 
analyzed for codon frequency for each amino acid, and the most frequent codon is selected for each 

25 amino acid. This approach differs from earlier reported codon matching methods, in which more 
than one frequent codon is selected for at least some of the amino acids. The optimal codons 
selected in this manner for rice and barley are shown in Table 1. 



Table 1 



Amino Acid 


Rice Preferred Codon 


Barley Preferred Codon | 


Ala A 


GCC 




1 ArgR 


CGC 




1 Asn N 


AAC 





30 
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Amino Acid 


Rice Preferred Codon 


Barley Preferred Codon 


1 Asp D 


GAC 





Cys C 


UGC 




GlnQ 


CAG 




GluE 


GAG 




Gly G 


GGC 




HisH 


CAC 




lie I 


AUG 





Leu L 


cue 




LysK 


AAG 




PheF 


uuc 




ProP 


CCG 


ccc 


Ser S 


AGC 


UGC 


ThrT 


ACC 




Tyr Y 


UAC 




Val V 


GUC 


GUG 


stop 


UAA 


UGA 1 



As indicated above, the fusion protein coding sequence in the chimeric gene is constructed 
such that the final (C-terminal) codon in the signal sequence is immediately followed by the codon 
5 for the N-terminal amino acid in the mature form of the heterologous protein. Exemplary fusion 
protein genes, in accordance with the present invention, are identified herein as follows: 

SEQ ID NO: 18, corresponding to codon-optimized coding sequences of the fusion protein 
consisting of RAmy3D signal sequence/mature ai-antitrypsin; 

SEQ ID NO: 19, corresponding to the fusion protein coding sequence consisting of the 
10 codon-optimized RAmy3D signal sequence and the native mature antithrombin III sequence; 

SEQ ID NO:20, corresponding to the fusion protein coding sequence consisting of the 
codon-optimized RAmy3D signal sequence and the native mature human serum albumin sequence; 

SEQ ID N0:2I, corresponding to codon-optimized coding sequence of the fusion protein 
RAmy3D signal sequence/prosubtilisin BPN'. In this instance, prosubtilisin is considered the 
15 "mature" protein, in that secreted prosubtilisin can autocatalyze to active, mature subtilisin. 

In a preferred embodiment, the BPN* coding sequence is further modified to eliminate 

14 
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potential N-glycosylation sites, as native BPN' is not glycosylated. Table 2 illustrates preferred 
codon substitutions, which eliminate all potential N-glycosylation sites in subtilisin BPN'. SEQ ID 

NO: 17 corresponds to a mature BPN* amino acid sequence containing the substitutions presented in 
Table 2. 

5 

Table 2 



iV-Glycosylation Sites 

— _ — 


Location (Asn) (in mature 
protein) 


Amino Acid 
Substitution 


Asn Asn Ser 


61 


Thr Asn Ser 


Asn Asn Ser 


76 


Thr Asn Ser 


Asn Met Ser 


123 


Thr Met Ser 


Asn Gly Thr 


218 


Ser Gly Thr^ 


Asn Trp Thr 


240 


Thr Trp Thr 



'improved thennostability; Bryan, e( aL, Proteins: Structure, Function, and Genetics 1:326 (1986). 
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A5. Transcription and Translation Terminators 

The chimeric gene may also include, downstream of the coding sequence, the 3* 
untranslated region (3 * UTR) from an inducible monocot gene, such as one of the rice or barley a- 
amylase genes mentioned above. One preferred 3* UTR is that derived from the RAmylA gene, 

15 whose sequence is given by SEQ ID N0:6. This sequence includes non-coding sequence 5' to the 
polyadenylation site, the polyadenylation site, and the transcription termination sequence. The 
transcriptional termination region may be selected, particularly for stability of the mRNA to 
enhance expression. Polyadenylation tails (Alber and Kawasaki, 1982, MoL and Appl. Genet, 
1:419-434) are also commonly added to the expression cassette to optimize high levels of 

20 transcription and proper transcription termination, respectively. Polyadenylation sequences include 
but are not limited to the Agrobacterium octopine synthetase signal (Gielen, et oL, EMBO J. 1:835- 
846 (1984) or the nopaline synthase of the same species (Depicker, et al, MoL AppL Genet. i:561- 
573 (1982). 

Since the ultimate expression of the heterologous protein will be in a eukaryotic cell (in this 
25 case, a member of the grass family), it is desirable to determine whether any portion of the cloned 
gene contains sequences which will be processed out as introns by the host's splicing machinery. If 
so, site-directed mutagenesis of the "intron" region may be conducted to prevent losing a portion of 
the genetic message as a false intron code (Reed and Maniatis, Cell 41:95-105 (1985). 

15 
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Fig. 2 shows the elements of one preferred chimeric gene constructed in accordance with 
the invention, and intended particularly for use in protein expression in a rice cell suspension 
culture. The gene includes, in a 5* to 3' direction, the promoter from the RAmy3D gene, which is 
inducible in cell culture with sugar depletion, the 5* UTR from the RAmylA gene, which confers 
5 enhanced stability on the gene transcript, the RAmy3D signal sequence coding region, as identified 
above, the coding region of a heterologous protein to be produced, and a 3' UTR region from the 
RAmylA gene. 



in. Plant Transformation 

10 For transformation of plants, the chimeric gene is placed in a suitable expression vector 

designed for operation in plants. The vector includes suitable elements of plasmid or viral origin 
that provide necessary characteristics to the vector to permit the vectors to move DNA from bacteria 
to the desired plant host. Suitable transformation vectors are described in related application PCT 
WO 95/14099, published May 25, 1995, which is incorporated by reference herein. Suitable 

15 components of the expression vector, including the chimeric gene described above, are discussed 
below. One exemplary vector is the p3Dvl.O vector described in Example 1. 

A. Transformation Vector 

Vectors containing a chimeric gene of the present invention may also include selectable 

20 markers for use in plant cells (such as the nptll kanamycin resistance gene, for selection in 
kanamycin-containing or the phosphinothricin acetyltransferase gene, for selection in medium 
containing phosphinothricin (PPT). 

The vectors may also include sequences that allow their selection and propagation in a 
secondary host, such as sequences containing an origin of replication and a selectable marker such 

25 as antibiotic or herbicide resistance genes, e.g., HPH (Hagio et al, Plant Cell Reports 14:329 
(1995); van der Elzer, Plant Mol. Biol. 5:299-302 (1985). Typical secondary hosts include bacteria 
and yeast. In one embodiment, the secondary host is Escherichia coli, the origin of replication is a 
colEl-type, and the selectable marker is a gene encoding ampicilHn resistance. Such sequences are 
well known in the art and are commercially available as well {e.g.y Clontech, Palo Alto, CA; 

30 Stratagene, La JoUa, CA). 

The vectors of the present invention may also be modified to intermediate plant 
transformation plasmids that contain a region of homology to an Agrobacterium tumefaciens vector, 
a T-DNA border region from Agrobacterium twnefacienSy and chimeric genes or expression 
cassettes (described above). Further, the vectors of the invention may comprise a disarmed plant 

35 tumor inducing plasmid of Agrobacteriwn tumefaciens. 

16 
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The vector described in Example 1, and having a promoter from the RAmy3D gene, is 
suitable for use in a method of mature protein production in cell culture, where the RAmySD 
promoter is induced by sugar depletion in cell culture medium. Other promoters may be seleaed 
for other applications, as indicated above. For example, for mature protein expression in 
5 germinating seeds, the coding sequence may be placed under the control of the rice ^-amylase 
RAmylA promoter, which is inducible by gibberellic acid during seed germination. 



B. Transformation of plant cells 

Various methods for direct or vectored transformation of plant cells, e.g., plant protoplast 

10 cells, have been described, e.g., in above-cited PCT application WO 95/14099. As noted in that 
reference, promoters directing expression of selectable markers used for plant transformation (e.g., 
nptll) should operate effectively in plant hosts. One such promoter is the nos promoter from native 
Ti plasmids (Herrera-Estrella, et al. Nature 303:209-213 (1983). Others include the 35S and 19S 
promoters of cauliflower mosaic virus (Odell, et aL, Nature 212:810-812 (1985) and the 2* 

15 promoter (Velten, et aL, EMBO J. 1:2723-2730 (1984). 

In one preferred embodiment, the embryo and endosperm of mature seeds are removed to 
exposed scutulum tissue cells. The cells may be transformed by DNA bombardment or injection, or 
by vectored transformation, e.g.y by Agrobacterium infection after bombarding the scuteller cells 
with microparticles to make them susceptible to Agrobacterium infection (Bidney et aL , Plant MoL 

20 Biol 18:301-313, 1992). 

One preferred transformation follows the methods detailed generally in Sivamani, E. et al.. 
Plant Cell Reports 15:465 (1996); Zhang, S., et al., Plant Cell Reports 15:465 (1996); and Li, L., 
et ai , Plant Cell Reports 12:250 (1993). Briefly, rice seeds are sterilized by standard methods, and 
callus induction from the seeds is carried out on MB media with 2,4D. During a first incubation 

25 period, callus tissue forms around the embryo of the seed. By the end of the incubation period, 
(e.g., 14 days at 28«>C) the calli are about 0.25 to 0.5 cm in diameter. Callus mass is then detached 
from the seed, and placed on fresh NB media, and incubated again for about 14 days at 28«>C. After 
the second incubation period, satellite calli developed around the original "mother" callus mass. 
These satellite calli were slightly smaller, more compact and defined than the original tissue. It was 

30 these calli were transferred to fresh media. The "mother " calli was not transferred. The goal was to 
select only the strongest, most vigorous growing tissue for further culture. 

Calli to be bombarded are selected from 14-day-K)ld subcultures. The size, shape, color and 
density are all important in selecting calli in the optimal physiological condition for transformation. 
The calli should be between .8 and 1.1 mm in diameter. The calli should appear as spherical 

35 masses with a rough exterior. 
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Transformation is by particle bombardment, as detailed in the references cited above. After 
the transformation steps, die cells are typically grown under conditions that permit expression of the 
selectable marker gene. In a preferred embodiment, the selectable marker gene is HPH. It is 
preferred to culture the transformed cells under multiple rounds of selection to produce a uniformly 
5 stable transformed cell line. 



IV. Cell Culture Production of Mature Heterologous Protein 

Transgenic cells, typically callus cells, are cultured under conditions that favor plant cell 

growth, until the cells reach a desired cell density, then under conditions that favor expression of 
10 the mature protein under the control of the given promoter. Preferred culture conditions are 

described below and in Example 2. Purification of the mature protein secreted into the medium is 

by standard techniques known by those of skill in the art. 

Production of mature AAT : In a preferred embodiment, the culture medium contains a 

phosphate buffer, e,g., the 20 mM phosphate buffer, pH 6.8 described in Example 2, to reduce 
15 AAT degradation catalyzed by metals. Alternatively, or in addition, a metal chelating agent, such 

as EDTA, may be added to the medium. 

Following the cell culture method described in Example 2, cell culture media was partially 

purified and the fraction containing AAT was analyzed by Western blot, as shown in Fig. 4. The 

first two lanes ("phosphate") show AAT bands both in the presence and absence of elastase ("+E" 
20 and where the higher molecular weight bands in the presence of elastase correspond roughly 

to a 58-59 kdal AAT/elastase complex. Also as seen in the figure, expression was high in the 

absence of sucrose, but nearly undetectable in the presence of sucrose. 

To ascertain the degree of glycosylation (as determined by apparent molecular weight by 

SDS-PAGE) the protein produced in culture was fractionated by SDS-PAGE and immunodetected 
25 with a labeled antibody raised against the C-terminal portion of AAT, as shown in Fig. 5. Lane 4 

contains human AAT, and its migration position corresponds to about 52 kdal. In lane 3 is the 

plant-produced AAT, having an apparent molecular weight of about 49-50 kdal, indicating an extent 

of glycosylation of up to 60-80% of the glycosylation found in human AAT (non-glycosylated AAT 

has a molecular weight of 45 kdal). 
30 Similar results are shown in the Western blots in Fig. 6. Lanes 1-3 in this figure 

correspond to decreasing amount (15, 10, and 5 ng) of human AAT; lane 4, to 10 ^1 supernatant 

from a non-expressing plant cell line; lanes 5 and 6, to 10 ^1 supernatant from AAT-expressing 

plant cell lines IIB and 27F, respectively, and lane 7, to 10 ^1 supernatant from cell line 27F plus 

250 ng trypsin. The upward mobility shift in lane 7 is indicative of association between trypsin and 
35 the plant-produced AAT. 
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The ability of plant-produced AAT to bind to elastase is demonstrated in Fig. 7, which 
shows the shift in molecular weight over a 30 minute binding interval for the 52 kdal human AAT 
(lanes 1-4) and the 49-50 kdal plant-produced AAT. 

To demonstrate that the mature protein is produced in secreted form, with the desired N- 

5 terminus, a chimeric gene constructed as above, and having the coding sequence for mature ar 
antitrypsin was expressed and secreted in cell culture as described in Example 2. The isolated 
protein was then sequenced at its N-terminal region, yielding the N-terminal sequence shown in Fig. 
8. This sequence, which is identified herein as SEQ ID NO:22, has the same N-terminal residues 
as native mature arantitrypsin. 

10 Production of mature ATITI : In a preferred embodiment, the culture medium contains a 

MES buffer, pH 6.8. Western blot analysis of the ATIII protein produced, shown in lanes 4 and 6 
in Fig. 9, shows a band corresponding to ATm Oane 1) in cell lines 42 and 46, when grown in the 
absence (but not in the presence) of sucrose. 

Production of mature BPN' : In one embodiment of the invention, in which BPN' is secreted 

15 as the proBPN* form of the enzyme, the chaperon "pro" moiety of the enzyme facilitates enzyme 
folding and is cleaved from the enzyme, leaving the active mature form of BPN\ In another 
embodiment, the mature enzyme is co-expressed and co-secreted with the "pro" chaperon moiety, 
with conversion of the enzyme to active form occurring in presence of the free chaperon (Eder et 
aL, Biochem. (1993) 32:18-26; Eder et al, (1993) 7, Mol. BioL 223:293-304). In yet another 

20 embodiment of the invention, the BPN' is secreted in inactive form at a pH that may be in the 6-8 
range, with subsequent activation of the inactive form, e.g.^ after enzyme isolation, by exposure to 
the "pro" chaperon moiety, e.5., immobilized to a solid support. 

In both of these embodiments, the culture medium is maintained at a pH of between 5 and 
6, preferably about 5.5 during the period of active expression and secretion of BPN', to keep the 

25 BPN*, which is normally active at alkaline pH, at a pH below optimal activity. 

Codon optimization to the host plant's most frequent codons yielded a severalfold 
enhancement in the level of expressed heterologous protein in cell culture as shown in Fig. 11. The 
extent of enhancement is seen ft-om the Western blot analysis shown in Fig. 10 for two cells lines 
and further substantiated in Fig. 11. Lane 2 (second from left) in Fig. 10 shows a Western blot of 

30 BPN' obtained in culture from cells transformed with a native proBPN' coding sequence. Two 
bands observed correspond to a lower molecular weight protein whose approximately 35 kdal 
molecular weight corresponds to that of proBPN'. The upper band corresponds to a somewhat 
higher molecular weight species, possibly glycosylated. 

The first lane in the figure shows BPN* polypeptides produced in culture by plant cells 

35 transformed with the codon-optimized proBPN' sequence identified by SEQ ID NO:21. For 

19 
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comparative purposes, the same volume of culture medium, adjusted for cell density, was applied in 
both lanes 1 and 2. As seen, the amount of BPN* enzyme produced with a codon-optimized 
sequence was severalfold higher than for subtilisin BPN' produced with the native coding sequence. 
Further, a dark band or bands corresponding to mature peptide (molecular weight 27.5 kdal) was 

5 observed. However, it should be noted that directly above the band at 35kD is a more pronounced 
band which may be pro mature product yet to be cleaved into active form. 

Fig. 11 compares the specific activity of BPN' codon-optimized (AP106) versus BPN' 
native (APlOl) expression in rice callus cell culture, assayed using the chromogenic peptide 
substrate suc-Ala-Ala-Pro-Phe-pNA as described by DelMar, E.G. et al. (1979; Anal. Biochem. 

10 99:316-320). As shown if Fig. 11, several of the cell lines transformed with codon-optimized 
chimeric genes produced levels of BPN*, as evidenced by measured specific activity in culture 
medium, that were 2-5 times the highest levels observed for plant cells transformed with native 
proBPN' sequence. 

In accordance with another aspect of the invention, it has been found that the transformed 
15 plant cell culture is able to express and secrete BPN' at a cell culture pH, pH 5.5, which largely 
inhibits self-degradation of mature, active BPN'. To assay for optimal pH conditions, the assay 
disclosed in DelMar, et al, (supra) is used to test the media derived from BPN' transformed cell 
lines under various pH conditions. Transformed rice callus cells are cultured in a MES medium 
under similar conditions as disclosed in Example 2, but where the pH of the medium is maintained 
20 at a selected pH between 5 and 8.0. At each pH, the total amount of expressed and secreted BPN' 
is determined by Western blot analysis. BPN' activity can be tested in the assay described by 
DelMar (supra). 

V. Production of Mature Heterologous Protein in Germinating Seeds 
25 In this embodiment, monocot cells transformed as above are used to regenerate plants, seeds 

from the plants are harvested and then germinated, and the mature protein is isolated from the 
germinated seeds. 

Plant regeneration from cultured protoplasts or callus tissue is carried by standard methods, 
e.g., as described in Evans et al. Handbook of Plant Cell Cultures Vol. 1: (MacMillan 
30 Publishing Co. New York, 1983); and Vasil I.R. (ed..), Cell Culture and Somatic Cell 
Genetics of Plants . Acad. Press, Orlando, Vol. I, 1984, and Vol. Ill, 1986, and as described in 
the above-cited PCT application. 
A. Seed Germination Conditions 

The transgenic seeds obtained from the regenerated plants are harvested, and prepared for 
35 germination by an initial steeping step, in which the seeds immersed in or sprayed with water to 
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increase the moisture content of the seed to between 35-45%. This initiates germination. Steeping 
typically takes place in a steep tank which is typically fitted with a conical end to allow the seed to 
flow freely out. The addition of compressed air to oxygenate the steeping process is an option. 
The temperature is controlled at approximately 22oC depending on the seed. 

5 After steeping, the seeds are transferred to a germination compartment which contains air 

saturated with water and is under controlled temperature and air flows. The typical temperatures 
are between 12-25<>C and germination is permitted to continue for from 3 to 7 days. 

Where the heterologous protein coding gene is operably linked to a inducible promoter 
requiring a metabolite such as sugar or plant hormone, e.g.y 2 to 100 gibberellic acid, this 

10 metabolite is added, removed or depleted from the steeping water medium and/or is added to the 
water saturated air used during germination. The seed absorbs the aqueous medium and begins to 
germinate, expressing the heterologous protein. The medium may then be withdrawn and the 
malting begun, by maintaining the seeds in a moist temperature controlled aerated environment. In 
this way, the seeds may begin growth prior to expression, so that the expressed product is less 

15 likely to be partially degraded or denatured during the process. 

More specifically, the temperature during the imbibition or steeping phase will be 
maintained in the range of about 15-25oC, while the temperature during the germination will usually 
be about 20°C. The time for the imbibition will usually be from about 1 to 4 days, while the 
germination time will usually be an additional 1 to 10 days, more usually 3 to 7 days. Usually, the 

20 time for the malting does not exceed about ten days. The period for the malting can be reduced by 
using plant hormones during the imbibition, particularly gibberellic acid. 

To achieve maximum production of recombinant protein from malting, the malting 
procedure may be modified to accommodate de-hulled and de-embryonated seeds, as described in 
above-cited PCT application WO 95/14099. In the absence of sugars from the endosperm, there is 

25 expected to be a 5 to 10 fold increase in RAmy3D promoter activity and thus expression of 
heterologous protein. Alternatively when embryoless half-seeds are incubated in 10 mM CaCl, and 
5 gibberellic acid, there is a 50 fold increase in RAmylA promoter activity. 

Production of mature HSA : Following the germination conditions as outlined above and 
further detailed in Example 3, supernatant was analyzed by Western blot. Western blot analysis 

30 shows production of HSA in germinating rice seeds, with seed samples taken 24, 72, and 120 hours 
after induction with gibberellin. HSA production was highest approximately 24 hours post- 
induction (}dnts 3 and 4, Fig. 12). Bilirubin binding, a measure of correct folding of plant- 
produced HSA, is assayed according to the method presented in Example 3. 



35 VI. Production of Mature Heterologous Protein in Maturing Seeds 
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In this embodiment, monocot cells transformed as above are used to regenerate plants, and 

seeds from the plants are allowed to mature, typically in the field, with consequent production of 

heterologous protein in the seeds. 

Following seed maturation, the seeds and their heterologous proteins may be used directly, 
5 that is, without protein isolation, where for example, the heterologous protein is intended to confer 

a benefit on the seed as a whole, for example, to enrich the seed in the selected protein. 

Alternatively, the seeds may be fractionated by standard methods to obtain the heterologous 

protein in enriched or purified form. In one general approach, the seed is first milled, then 

suspended in a suitable extraction medium, e.g., an aqueous or an organic solvent, to extract the 
10 protein or metabolite of interest. If desired the heterologous protein can be ftirther fractionated and 

purified, using standard purification methods. 



The following examples are provided by way of illustration only and not by way of 
limitation. Those of skill will readily recognize a variety of noncritical parameters which could be 
15 changed or modified to yield essentially similar results. 



General Methods 

Generally, the nomenclature and laboratory procedures with respect to standard recombinant 
DNA technology can be found in Sambrook, et aL, Molecular Cloning - A Laboratory 
20 Manual . Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 1989 and in S.B. 
Gelvin and R.A. Schilperoot, Plant Molecular Biology . 1988. Other general references are 
provided throughout this document. The procedures therein are known in the art and are provided 
for the convenience of the reader. 



25 Example 1 

Construction of a Transforming Vector Containing 
a Codon-Qptimized r*i-antitrvpsin Sequence 

A. Hygromvcin Resistance Gene Insertion : 

The 3 kb BamRl fragment containing the 35S promoter-Hph-NOS was removed from the 

30 plasmid pMON410 (Monsanto, St. Louis, MO) and placed into an site-directed mutagenized BglH 

site in the pUClS at 1463 to form the plasmid pUCH18+. 



B, Terminator Insertion : 

pOSglABKS is a 5 kb BamUhKpnl fragment from lambda clone xOSglA (Huang, N,, et 
35 a/., (1990) Nuc. Acids Res. 18:7007) cloned into pBluescript KS- (Stratagene, San Diego, CA). 
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Plasmid pOSglABKS was digested with Mspl and blunted with T4 DNA polymerase followed by 
Spel digestion. The 350 bp terminator fragment was subcloned into pUC19 (New England 
BioLabs, Beverly, MA), which had been digested with BamHI, blunted with T4 DNA polymerase 
and digested with Xbal, to fonn pUClQ/terminator. 

5 

C. RAmy3P Promoter Insertion : 
A 1.1 kb Nhel'Pstl fragment derived from plASl.5 (Huang, N. et al. (1993) Plant Mol. 

BioL 23:737-747), was cloned into the vector pGEMSzf- [multiple cloning site (MCS) (Promega, 
Madison, WI): Apal Aatll Sphl, Ncol, SstU, EcoRV, Spel, Aforl, Psil, San, Ndel, Sad, Affz/I, 
10 JVjzI] at the Spel and Pstl sites to form pGEM5zf-(3D/JVftd-Pj/I). pGEM5zf-(3D/Mid-P^fl) was 
then digested with Pstl and Sad, and two non-kinased 30mers having the complementary sequences 
5* GCTTG ACCTG TAACT CGGGC CAGGC GAGCT 3' (SEQ ID NO:23) and 5' CGCCT 
AGCCC GAGTT ACAGG TCAAG CAGCT 3' (SEQ ID NO:24) were ligated in to form 
p3DProSig. The promoter fragment prepared by digesting p3DProSig with Ncol, blunting with T4 
15 DNA polymerase, and digesting with Sstl was subcloned into pUC19/terminator which had been 
digested with EcdRl, blunted with T4 DNA polymerase and digested with Sstl, to form 
p3DProSigEND. 

D. Multiple Cloning Site Insertion : 
p3DProSigEND was digested with Sstl and Smal followed by the ligation of a new synthetic 

linker fragment constructed with the non-kinased complementary oligonucleotides 5* AQCTC 
CATGG CCGTG GCTCG AGTCT AGACG CGTCC CC 3' (SEQ ID NO:25) and 5' GGGGA 
CGCGT CTAGA CTCGA GCCAC GGCCA TGG 3' (SEQ ID NO:26) to form 
p3DProSigENDlink. 

E. p3DProSigENDHnk Flanking Site Modification : 
p3DProSigENDlink was digested with SaR and blunted with T4 DNA polymerase followed 

by EcoKV digestion. The blunt fragment was then inserted into pBluescript KS+ (Stratagene) in 
the EcoKW site so that the Hindlll site is proximal to the promoter and the EcdSl is proximal to the 
terminator sequence. The HinillhEcoRl fragment was then moved into the polylinker of 
pUCHI8+ to form the p3DvL0 expression vector. 

F. RAmvl A Promoter Insertion : 
A 1.9 kb Nhel'Pstl fragment derived from subclone pOSG2CA2.3 from lambda clone 

35 xOSg2 (Huang et al, (1990) Plant Mol. Biol. 14:655-668), was cloned into the vector pGEM5zf- at 
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the Spel and PstI sites to form pGEMSzHlA/NhehPstl). pGEM5zf-(lA/i\ft€l-ftrI) was digested 
with Psfl and Sad and two non-kinased 35mers and four kinased 32mers were ligated in, with the 
complementary sequences as follows: 5' GCATG CAGGT GCTGA ACACC ATGGT GAACA 
AACAC 3' (SEQ ID NO:27); 5* TTCTT GTCCC TTTCG GTCCT CATCG TCCTC CT 3' (SEQ 

5 ID NO:28); 5* TGGCC TCTCC TCCAA CTTGA CAGCC GGGAG CT 3' (SEQ ID 0:29); 5' 
TTCAC CATGG TGTTC AGCAC CTGCA TGCTG CA 3* (SEQ ID NO:30); 5' CGATG AGGAC 
CGAAA GGGAC AAGAA GTGTT TG 3* (SEQ ID NO:31); 5* CCCGG CTGTC AAGTT 
GGAGG AGAGG CCAAG GAGGA 3* (SEQ ID NO:32) to form plAProSig. The HindUl-Sacl 0.8 
kb promoter fragment was subcloned from plAProSig into the p3Dvl.O vector digested with 

10 Hindm-SacI to yield the plAvl.O expression vector. 

G. Construction of p3D-AAT Plasmid 

Two PCR primers were used to amplify a fragment encoding AAT according to the 

sequence disclosed as Genbank Accession No, K01396: N-terminal primer 5' GAGGA TCCCC 
15 AGGGA GATGC TGCCC AGAA 3' (SEQ ID NO:33) and C-terminal primer 5* CGCGC TCGAG 

TTATT TTTGG GTGGG ATTCA CCAC 3* (SEQ ID NO:34). The N-terminal primer amplifies to 

a blunt site for in-frame insertion with the end of the p3D signal peptide and the C-terminal primer 

contains a Xhol site for cloning the fragment into the vector as shown in Figs. 3A and 3B. 

Alternatively, the sequence encoding mature AAT (SEQ ID NO: 8) or codon-optimized AAT may be 
20 chemically synthesized using techniques known in the art, incorporating a Xhol restriction site 3* of 

the termination codon for insertion into the expression vector as described above. 

Example 2 

Production of mature rr-antitrvpsin in cell culture 
25 After selection of transgenic callus, callus cells were suspended in liquid culture contaimng 

AA2 media (Thompson, J.A., et qL, Plant Science 47:123 (1986), at 3% sucrose, pH 5.8. 
Thereafter, the cells were shifted to phosphate-buffered media (20 mM phosphate buffer, pH 6.8) 
using 10 mL multi-well tissue culture plates and shaken at 120 rpm in the dark for 48 hours. The 
supernatant was then removed and stored at -80oC prior to western blot analysis. 
30 Supematants were concentrated using Centricon- 10. filters (Amicon cat. #4207) and washed 

with induction media to remove substances interfering with electrophoretic migration. Samples 
were concentrated approximately 10 fold, and mature AAT was purified by SDS PAGE 
electrophoresis. The purified protein was extracted from the electrophoresis medium, and 
sequenced at its N-terminus, giving the sequence shown in Fig. 8, identified herein as SEQ ID 
35 NO:22. 
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Example 3 
HSA Induction in Germinating Seeds 
5 After selection of transgenic plants which tested positive for the presence of a codon- 

optimized HSA gene driven by the G As-responsive RAmylA promoter, seeds were harvested and 
imbibed for 24 hours with 100 rpm orbital shaking in the dark at IS^C. GA3 was added to a final 
concentration of 5;xM and incubated for an additional 24-120 hours. Total soluble protein was 
isolated by double grinding each seed in 120 /xl grinding buffer and centrifuging at 23,000 x g for 1 
10 minute at 4oC. The clear supernatant was carefully removed from the pellet and transferred to a 
fresh tube. 

Bilirubin binding assay 

Bilirubin binding to its high-affinity site on mature HSA is assayed using the method 
described by Jacobsen, J. et aL (1974; Clin. Chem. 20:783) and Reed, R.G. et aL (1975; 

15 Biochemistry 14:4578-4583). Briefly, the concentration of free bilirubin in equilibrium with 
protein-bound bilirubin is determined by the rate of peroxide-peroxidase catalyzed oxidation of free 
bilirubin. Stock solutions of bilirubin (Nutritional Biochemicals Corp.) are prepared fresh daily in 
5 mM NaOH containing ImM EDTA and the concentration determined using a molar absorptivity 
of 47,500 M"^ cm'* at 440 nm. An aliquot containing between 5 and 30 nmol bilirubin is added to a 

20 1 cm cuvette containing 1 ml PBS and approximately 30 nmol HSA at 37oC. An absorbance 
spectrum between 500 and 350 nm is recorded. AUquots of horseradish peroxidase (Sigma), 0.05 
mg/ml in PBS, and 0.05% ethyl hydrogen peroxide (Ferrosan; Malmo Sweden) are added and the 
change in absorbance at xmax is recorded for 3-5 minutes. The concentrations of free and bound 
billirubin calculated from the oxidation rate observed using varying concentrations of total bilirubin 

25 are used to construct a Scatchard plot from which the association constant for a single binding site is 
determined. 

Although the invention has been described with reference to particular embodiments, it will 
be appreciated that a variety of changes and modifications can be made without departing from the 
invention. 

30 
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SEQUENCE LISTING 
5 (1) GENERAL INFORMATION 

(i) APPLICANT: Applied Phytologics, Inc. 

(ii) TITLE OF THE INVENTION: Production of Mature Proteins 
10 in Plants 

(iii) NUMBER OF SEQUENCES: 34 

(iv) CORRESPONDENCE ADDRESS: 

15 (A) ADDRESSEE: Dehlinger & Associates 

(B) STREET: P.O. Box 60850 

(C) CITY: Palo Alto 

(D) STATE: CA 

(E) COUNTRY: USA 
20 (F) ZIP: 94306 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 
25 (C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 





(A) 


APPLICATION NUMBER: 


PCT/US98/03068 


30 


(B) 


FILING DATE: 13 -FEB- 


1998 




(C) 


CLASSIFICATION : 






(vii) 


PRIOR APPLICATION DATA: 




(A) 


APPLICATION NUMBER: 


60/038,169 


35 


(B) 


FILING DATE: 13 -FEB - 


1997 




(A) 


APPLICATION NUMBER: 


60/037 , 991 




(B) 


FILING DATE: 13 -FEB - 


1997 


40 


(A) 


APPLICATION NUMBER: 


60/038,170 




(B) 


FILING DATE: 13 -FEB - 


1997 




(A) 


APPLICATION NUMBER: 


60/038,168 




(B) 


FILING DATE: 13 -FEB- 


1997 



45 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Petithory, Joanne R 

(B) REGISTRATION NUMBER: P42,995 

(C) REFERENCE/DOCKET NUMBER: 0665-0007.41 

50 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-324-0880 

(B) TELEFAX: 650-324-0960 

55 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 amino acids 

60 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(vii) IMMEDIATE SOURCE: 

65 (B) CLONE: 3D signal peptide sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

26 
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Met Lys Asn Thr Ser Ser Leu Cys Leu Leu Leu Leu Val Val Leu Cys 

15 10 15 

Ser Leu Thr Cys Asn Ser Gly Gin Ala 
20 25 

5 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 75 base pairs 
10 (B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 
15 (B) CLONE: native 3D signal peptide DNA sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

ATGAAGT^CA CCAGCAGCTT GTGTTTGCTG CTCCTCGTGG TGCTCTGCAG CTTGACCTGT 60 
20 AACTCGGGCC AGGCG 75 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 75 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (vii) IMMEDIATE SOURCE: 

(B) CLONE: codon-optimized 3D signal peptide DNA sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

35 ATGAAGAACA CCTCCTCCCT CTGCCTCCTG CTGCTCGTGG TCCTCTGCTC CCTGACCTGC 60 

AACAGCGGCC AGGCC 75 

(2) INFORMATION FOR SEQ ID NO: 4: 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

45 (ii) MOLECULE TYPE: peptide 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: RAmylA signal peptide 



50 



55 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Val Asn Lys His Phe Leu Ser Leu Ser Val Leu lie Val Leu Leu 

15 10 15 

Gly Leu Ser Ser Asn Leu Thr Ala Gly 
20 25 

(2) INFORMATION FOR SEQ ID NO : 5 : 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 51 base pairs 
60 (B) TYPE: nucleic acid 

{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 
65 (B) CLONE: RAmy lA 5' untranslated region (UTR) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

27 



8NSDOCID:<WO 9836085A1 I > SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCT/US98/03068 



ATCAATCATC CATCTCCGAA GTGTGTCTGC AGCATGCAGG TGCTGAACAC C * 51 
(2) INFORMATION FOR SEQ ID NO : 6 : 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: RAmy lA 3' untranslated region (UTR) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

15 

GCGCACGATG ACGAGACTCT CAGTTTAGCA GATTTAACCT GCGATTTTTA CCCTGACCGG 60 

TATACGTATA TACGTGCCGG CAACGAGCTG TATCCGATCC GAATTACGGA TGCAATTGTC 120 

CACGAAGTAC TTCCTCCGTA AATAAAGTAG GATCAGGGAC ATACATTTGT ATGGTTTTAC 180 

GAATAATGCT ATGCAATAAA ATTTGCACTG CTTAATGCTT ATGCATTTTT GCTTGGTTCG 24 0 

20 ATTGTACTGG TGAATTATTG TTACTGTTCT TTTTACTTCT CGAGTGGCAG TATTGTTCTT 3 00 

CTACGAAAAT TTGATGCGTA G 321 

(2) INFORMATION FOR SEQ ID NO : 7 : 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 94 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: protein 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: mature AAT amino acid sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 7: 



35 





Glu 


Asp 


Pro 


Gin Gly 


Asp 


Ala 


Ala 


Gin 


Lys 


Thr 


Asp 


Thr 


Ser 


His 


His 




1 








5 










10 










15 






Asp 


Gin 


Asp 


His 


Pro 


Thr 


Phe 


Asn 


Lys 


He 


Thr 


Pro 


Asn 


Leu 


Ala 


Glu 








20 










25 










30 






40 


Phe 


Ala 


Phe 
35 


Ser 


Leu 


Tyr 


Arg 


Gin 
40 


Leu 


Ala 


His 


Gin 


Ser 
45 


Asn 


Ser 


Thr 




Asn 


He 
50 


Phe 


Phe 


Ser 


Pro 


Val 
55 


Ser 


He 


Ala 


Thr 


Ala 
60 


Phe 


Ala 


Met 


Leu 




Ser 


Leu Gly 


Thr 


Lys 


Ala 


Asp 


Thr 


His 


Asp 


Glu 


He 


Leu 


Glu 


Gly 


Leu 


45 


65 










70 










75 










80 




Asn 


Phe 


Asn 


Leu 


Thr 
85 


Glu 


He 


Pro 


Glu 


Ala 
90 


Gin 


He 


His 


Glu 


Gly 
95 


Phe 




Gin 


Glu 


Leu 


Leu 


Arg 


Thr 


Leu 


Asn 


Gin 


Pro 


Asp 


Ser 


Gin 


Leu 


Gin 


Leu 










100 








105 










110 






50 


Thr 


Thr 


Gly 


Asn Gly 


Leu 


Phe 


Leu 


Ser, 


.Glu 


Gly 


Leu 


Lys 


Leu 


Val 


Asp 








115 










120 










125 










Lys 


Phe 
130 


Leu 


Glu 


Asp 


Val 


Lys 
135 


Lys 


Leu 


Tyr 


His 


Ser 
140 


Glu 


Ala 


Phe 


Thr 




Val 


Asn 


Phe 


Gly 


Asp 


Thr 


Glu 


Glu 


Ala 


Lys 


Lys 


Gin 


He 


Asn 


Asp 


Tyr 


55 


145 










150 










155 










160 




Val 


Glu 


Lys 


Gly 


Thr 
165 


Gin 


Gly 


Lys 


He 


Val 
170 


Asp 


Leu 


Val 


Lys 


Glu 

175 


Leu 




Asp 


Arg 


Asp 


Thr 
180 


Val 


Phe 


Ala 


Leu 


Val 
185 


Asn 


Tyr 


He 


Phe 


Phe 
190 


Lys 


Gly 


60 


Lys 


Trp 


Glu 
195 


Arg 


Pro 


Phe 


Glu 


Val 
200 


Lys 


Asp 


Thr 


Glu 


Glu 
205 


Glu 


Asp 


Phe 




His 


Val 
210 


Asp 


Gin 


Val 


Thr 


Thr 
215 


Val 


Lys 


Val 


Pro 


Met 
220 


Met 


Lys 


Arg 


Leu 




Gly Met 


Phe 


Asn 


lie 


Gin 


His 


Cys 


Lys 


Lys 


Leu 


Ser 


Ser 


Trp 


Val 


Leu 


65 


225 










230 










235 










240 




Leu 


Met 


Lys 


Tyr 


Leu 


Gly 


Asn 


Ala 


Thr 


Ala 


He 


Phe 


Phe 


Leu 


Pro 


Asp 



245. 250 255 
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10 



15 



30 



55 



60 



Glu 


Gly 


Lys 


Leu 


Gin 


His 


Leu 


Glu 


Asn 


Glu 


Leu 


Thr 


His 


Asp 


lie 


He 








260 










265 










270 






Thr 


Lys 


Phe 


Leu 


Glu 


Asn 


Glu 


Asp 


Arg 


Arg 


Ser 


Ala 


Ser 


Leu 


His 


Leu 






275 










280 










285 








Pro 


Lys 


Leu 


Ser 


He 


Thr 


Gly 


Thr 


Tyr 


Asp 


Leu 


Lys 


Ser 


Val 


Leu 


Gly 




o Q r* 

z y 0 










2 95 










3 00 








Kj±n 


Lsu 


\j ±y 


X J. e 


1 nx 


Lys 




o v-« 

Pne 


Ser 


Asn 


(jjLy 


AX a 


Asp 


Leu 


Ser 


Gly 












3 1 U 










Jib 










320 


V a X 


i nr 


o ± u 


v> J. u 




Pro 


Leu 


Lys 


Leu 


Ser 


j-iy S 


Ala 

Axa 


val 


His 


Lys 


Ala 










T O C 










3 3 0 










335 




V3.1 


Leu 


l LIT 


lie 


Asp 


Glu 


Lys 


Gly 


Thr 


Glu 


Ala 


Ala 


Gly 


Ala 


Met 


Phe 








34 0 










345 










350 






Leu 


CalU 


Ala 


lie 


Pro 


Met 


Ser 


He 


Pro 


Pro 


Glu 


Val 


Lys 


Phe 


Asn 


Lys 






355 










360 










365 






Pro 


Phe 


Val 


Phe 


Leu 


Met 


lie 


Glu 


Gin 


Asn 


Thr 


Lys 


Ser 


Pro 


Leu 


Phe 




370 










375 










380 










Met 


Gly 


Lys 


Val 


Val 


Asn 


Pro 


Thr 


Gin 


Lys 














385 










390 























20 (2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 5 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: native coding sequence of mature AAT 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



GAGGATCCCC AGGGAGATGC TGCCCAGAAG ACAGATACAT CCCACCATGA TCAGGATCAC 60 

CCAACCTTCA ACAAGATCAC CCCCAACCTG GCTGAGTTCG CCTTCAGCCT ATACCGCCAG 12 0 

35 CTGGCACACC AGTCCAACAG CACCAATATC TTCTTCTCCC CAGTGAGCAT CGCTACAGCC 180 

TTTGCAATGC TCTCCCTGGG GACCAAGGCT GACACTCACG ATGAAATCCT GGAGGGCCTG 24 0 

AATTTCAACC TCACGGAGAT TCCGGAGGCT CAGATCCATG AAGGCTTCCA GGAACTCCTC 3 00 

CGTACCCTCA ACCAGCCAGA CAGCCAGCTC CAGCTGACCA CCGGCAATGG CCTGTTCCTC 360 

AGCGAGGGCC TGAAGCTAGT GGATAAGTTT TTGGAGGATG TTAAAAAGTT GTACCACTCA 42 0 

4 0 GAAGCCTTCA CTGTCAACTT CGGGGACACC GAAGAGGCCA AGAAACAGAT CAACGATTAC 4 80 

GTGGAGAAGG GTACTCAAGG GAAAATTGTG GATTTGGTCA AGGAGCTTGA CAGAGACACA 54 0 

GTTTTTGCTC TGGTGAATTA CATCTTCTTT AAAGGCAAAT GGGAGAGACC CTTTGAAGTC 6 00 

AAGGACACCG AGGAAGAGGA CTTCCACGTG GACCAGGTGA CCACCGTGAA GGTGCCTATG 660 

ATGAAGCGTT TAGGCATGTT TAACATCCAG CACTGTAAGA AGCTGTCCAG CTGGGTGCTG 720 

45 CTGATGAAAT ACCTGGGCAA TGCCACCGCC ATCTTCTTCC TGCCTGATGA GGGGAAACTA 78 0 

CAGCACCTGG AAAATGAACT CACCCACGAT ATCATCACCA AGTTCCTGGA AAATGAAGAC 84 0 

AGAAGGTCTG CCAGCTTACA TTTACCCAAA CTGTCCATTA CTGGAACCTA TGATCTGAAG 900 

AGCGTCCTGG GTCAACTGGG CATCACTAAG GTCTTCAGCA ATGGGGCTGA CCTCTCCGGG 960 

GTCACAGAGG AGGCACCCCT GAAGCTCTCC AAGGCCGTGC ATAAGGCTGT GCTGACCATC 1020 

50 GACGAGAAAG GGACTGAAGC TGCTGGGGCC ATGTTTTTAG AGGCCATACC CATGTCTATC 108 0 

CCCCCCGAGG TCAAGTTCAA CAAACCCTTT GTCTTCTTAA TGATTGAACA AAATACCAAG 114 0 

TCTCCCCTCT TCATGGGAAA AGTGGTGAAT CCCACCCAAA AATAA 1185 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 32 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: mature ATIII aa sequence 

65 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

His Gly Ser Pro Val Asp lie Cys Thr Ala Lys Pro Arg Asp lie Pro 
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15 10 15 





Met. 


Asn 


Pro 


Met 


Cys 


lie 


Tyr 


Arg 


Ser 


Pro 


Glu 


Lys 


Lys 


Ala 


Thr 


Glu 










2 0 










2 5 








30 






5 


Asp 


Glu 


Gly 


Ser 


Glu 




Lys 


lie 


Pro 


Glu 


Ala 


Thr 


Asn 


Arg 


Arg 


Val 








3 5 










4 0 










45 






Trp 


Glu 


Leu 


Ser 


Lys 


Ala 


Asn 


Ser 


Arg 


Phe 


Ala 


Thr 


Thr 


Phe 


Tyr 


Gin 






50 










55 










60 










His 


Leu 


Ala 


Asp 


Ser 


Lys 


Asn 


Asp 


Asn 


Asp 


Asn 


He 


Phe 


Leu 


Ser 


Pro 


10 


65 










70 










75 










80 




Leu 


Ser 


He 


Ser 


Thr 


Ala 


Phe 


Ala 


Met 


Thr 


Lys 


Leu 


Gly Ala 


Cys 


Asn 












85 










90 










95 






Asp 


Thr 


Leu 


Gin 


Gin 


Leu 


Met 


Glu 


Val 


Phe 


Lys 


Phe 


Asp 


Thr 


He 


Ser 










100 










105 






110 






15 


Glu 


Lys 


Thr 


Ser 


Asp 


Gin 


He 


His 


Phe 


Phe 


Phe 


Ala 


Lys 


Leu 


Asn 


Cys 








115 










120 










125 








Arg 


Leu 


Tyr 


Arg 


Lys 


Ala 


Asn 


Lys 


Ser 


Ser 


Lys 


Leu 


Val 


Ser 


Ala 


Asn 






130 










135 








140 












Arg 


Leu 


Phe 


Gly 


Asp 


Lys 


Ser 


Leu 


Thr 


Phe 


Asn 


Glu 


Thr 


Tyr 


Gin 


Asp 


20 


145 










150 










155 








160 




He 


Ser 


Glu 


Leu 


Val 


Tyr Gly Ala 


Lys 


Leu 


Gin 


Pro 


Leu 


Asp 


Phe 


Lys 












165 










170 








175 




Glu 


Asn 


Ala 


Glu 


Gin 


Ser 


Arg 


Ala 


Ala 


He 


Asn 


Lys 


Trp 


Val 


Ser 


Asn 










180 










185 








190 






25 


Lys 


Thr 


Glu 
195 


Gly 


Arg 


He 


Thr 


Asp 
200 


Val 


He 


Pro 


Ser 


Glu 
205 


Ala 


He 


Asn 




Glu 


Leu 


Thr 


Val 


Leu 


Val 


Leu 


Val 


Asn 


Thr 


He 


Tyr 


Phe 


Lys 


Gly 


Leu 






210 










215 










220 








Trp 


Lys 


Ser 


Lys 


Phe 


Ser 


Pro 


Glu 


Asn 


Thr 


Arg 


Lys 


Glu 


Leu 


Phe 


Tyr 


30 


225 










230 










235 










240 




Lys 


Ala 


Asp 


Gly 


Glu 


Ser 


Cys 


Ser 


Ala 


Ser 


Met 


Met 


Tyr 


Gin 


Glu 


Gly 












245 










250 










255 




Lys 


Phe 


Arg 


Tyr 


Arg 


Arg 


Val 


Ala 


Glu Gly Thr 


Gin 


Val 


Leu 


Glu 


Leu 










260 










265 










270 






35 


Pro 


Phe 


Lys 
275 


Gly 


Asp 


Asp 


He 


Thr 
280 


Met 


Val 


Leu 


He 


Leu 
285 


Pro 


Lys 


Pro 




Glu 


Lys 
290 


Ser 


Leu 


Ala 


Lys 


Val 
295 


Glu 


Lys 


Glu 


Leu 


Thr 
300 


Pro 


Glu 


Val 


Leu 




Gin 


Glu 


Trp 


Leu 


Asp 


Glu 


Leu 


Glu 


Glu 


Met 


Met 


Leu 


Val 


Val 


His 


Met 


40 


305 










310 










315 










320 




Pro 


Arg 


Phe 


Arg 


He 
325 


Glu 


Asp 


Gly 


Phe 


Ser 
330 


Leu 


Lys 


Glu 


Gin 


Leu 
335 


Gin 




Asp 


Met 


Gly 


Leu 
340 


Val 


Asp 


Leu 


Phe 


Ser 
345 


Pro 


Glu 


Lys 


Ser 


Lys 
350 


Leu 


Pro 


45 


Gly 


He 


Val 
355 


Ala 


Glu 


Gly 


Arg 


Asp 
360 


Asp 


Leu 


Tyr 


Val 


Ser 
365 


Asp 


Ala 


Phe 




His 


Lys 


Ala 


Phe 


Leu 


Glu 


Val 


Asn 


Glu 


Glu 


Gly 


Ser 


Glu 


Ala 


Ala 


Ala 






370 










375 








380 












Ser 


Thr 


Ala 


Val 


Val 


He Ala Gly Arg Ser Leu 


Asn 


Pro 


Asn 


Arg 


Val 


50 


385 










390 










395 








400 




Thr 


Phe 


Lys 


Ala 


Asn 


Arg 


Pro 


Phe 


Leu 


Val 


Phe 


He 


Arg 


Glu 


Val 


Pro 












405 










410 








415 






Leu 


Asn 


Thr 


He 


He 


Phe 


Met 


Gly Arg Val 


Ala 


Asn 


Pro 


Cys 


Val 


Lys 










420 










425 










430 







55 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1299 base pairs 
60 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 
65 (B) CLONE: native ATIII DNA sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
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CACGGAAGCC CTGTGGACAT CTGCACAGCC AAGCCGCGGG ACATTCCCAT GAATCCCATG 6 0 

TGCATTTACC GCTCCCCGGA GAAGAAGGCA ACTGAGGATG AGGGCTCAGA ACAGAAGATC 12 0 

CCGGAGGCCA CCAACCGGCG TGTCTGGGAA CTGTCCAAGG CCAATTCCCG CTTTGCTACC 18 0 

ACTTTCTATC AGCACCTGGC AGATTCCAAG AATGACAATG ATAACATTTT CCTGTCACCC 24 0 

5 CTGAGTATCT CCACGGCTTT TGCTATGACC AAGCTGGGTG CCTGTAATGA CACCCTCCAG 3 00 

CAACTGATGG AGGTATTTAA GTTTGACACC ATATCTGAGA AAACATCTGA TCAGATCCAC 36 0 

TTCTTCTTTG CCAAACTGAA CTGCCGACTC TATCGAAAAG CCAACAAATC CTCCAAGTTA 4 20 

GTATCAGCCA ATCGCCTTTT TGGAGACAAA TCCCTTACCT TCAATGAGAC CTACCAGGAC 4 80 

ATCAGTGAGT TGGTATATGG AGCCAAGCTC CAGCCCCTGG ACTTCAAGGA AAATGCAGAG 54 0 

10 CAATCCAGAG CGGCCATCTU^ CAAATGGGTG TCCAATAAGA CCGAAGGCCG AATCACCGAT 600 

GTCATTCCCT CGGAAGCCAT CAATGAGCTC ACTGTTCTGG TGCTGGTTAA CACCATTTAC 660 

TTCAAGGGCC TGTGGAAGTC AAAGTTCAGC CCTGAGAACA CAAGGAAGGA ACTGTTCTAC 72 0 

AAGGCTGATG GAGAGTCGTG TTCAGCATCT ATGATGTACC AGGAAGGCAA GTTCCGTTAT 78 0 

CGGCGCGTGG CTGAAGGCAC CCAGGTGCTT GAGTTGCCCT TCAAAGGTGA TGACATCACC 84 0 

15 ATGGTCCTCA TCTTGCCCAA GCCTGAGAAG AGCCTGGCCA AGGTGGAGAA GGAACTCACC 900 

CCAGAGGTGC TGCAGGAGTG GCTGGATGAA TTGGAGGAGA TGATGCTGGT GGTTCACATG 960 

CCCCGCTTCC GCATTGAGGA CGGCTTCAGT TTGAAGGAGC AGCTGCAAGA CATGGGCCTT 102 0 

GTCGATCTGT TCAGCCCTGA AAAGTCCAAA CTCCCAGGTA TTGTTGCAGA AGGCCGAGAT 10 8 0 

GACCTCTATG TCTCAGATGC ATTCCATAAG GCATTTCTTG AGGTAAATGA AGAAGGCAGT 114 0 

20 GAAGCAGCTG CAAGTACCGC TGTTGTGATT GCTGGCCGTT CGCTAAACCC CAACAGGGTG 12 00 

ACTTTCAAGG CCAACAGGCC CTTCCTGGTT TTTATAAGAG AAGTTCCTCT GAACACTATT 126 0 

ATCTTCATGG GCAGAGTAGC CAACCCTTGT GTTAAGTAA 1299 

(2) INFORMATION FOR SEQ ID NO: 11: 

25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: protein 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: mature HSA amino acid sequence 

3 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 





Asp 


Ala 


His 


Lys 


Ser 


Glu 


Val 


Ala 


His 


Arg 


Phe 


Lys 


Asp 


Leu 


Gly 


Glu 




1 








5 










10 










15 






Glu 


Asn 


Phe 


Lys 


Ala 


Leu 


Val 


Leu 


lie 


Ala 


Phe 


Ala 


Gin 


Tyr 


Leu 


Gin 


40 








20 










25 










30 








Gin 


Cys 


Pro 
35 


Phe 


Glu 


Asp 


His 


Val 
40 


Lys 


Leu 


Val 


Asn 


Glu 
45 


Val 


Thr 


Glu 




Phe 


Ala 
50 


Lys 


Thr 


Cys 


Val 


Ala 
55 


Asp 


Glu 


Ser 


Ala 


Glu 
60 


Asn 


Cys 


Asp 


Lys 


45 


Ser 


Leu 


His 


Thr 


Leu 


Phe 


Gly 


Asp 


Lys 


Leu 


Cys 


Thr 


Val 


Ala 


Thr 


Leu 




65 










70 








75 










80 




Arg 


Glu 


Thr Tyr Gly 


Glu 


Met 


Ala 


Asp 


Cys 


Cys 


Ala 


Lys 


Gin 


Glu 


Pro 












85 










90 










95 






Glu Arg 


Asn 


Glu 


Cys 


Phe 


Leu 


Gin 


His 


Lys 


Asp 


Asp 


Asn 


Pro 


Asn 


Leu 


50 








100 










105 










110 








Pro 


Arg 


Leu 
115 


Val 


Arg 


Pro 


Glu 


Val 
120 


Asp 


Val 


Met 


Cys 


Thr 
125 


Ala 


Phe 


His 




Asp 


Asn 
130 


Glu 


Glu 


Thr 


Phe 


Leu 
135 


Lys 


Lys 


Tyr 


Leu 


Tyr 
140 


Glu 


lie 


Ala 


Arg 


55 


Arg 
145 


His 


Pro 


Tyr 


Phe 


Tyr 
150 


Ala 


Pro 


Glu 


Leu 


Leu 
155 


Phe 


Phe 


Ala 


Lys 


Arg 
160 




Tyr 


Lys 


Ala 


Ala 


Phe 
165 


Thr 


Glu 


Cys 


Cys 


Gin 
170 


Ala 


Ala 


Asp 


Lys 


Ala 
175 


Ala 




Cys 


Leu 


Leu 


Pro 


Lys 


Leu 


Asp 


Glu 


Leu 


Arg 


Asp 


Glu 


Gly 


Lys 


Ala 


Ser 


60 








180 










185 










190 








Ser 


Ala 


Lys 
195 


Gin 


Arg 


Leu 


Lys 


Cys 
200 


Ala 


Ser 


Leu 


Gin 


Lys 
205 


Phe 


Gly 


Glu 




Arg 


Ala 
210 


Phe 


Lys 


Ala 


Trp 


Ala 
215 


Val 


Ala 


Arg 


Leu 


Ser 
220 


Gin 


Arg 


Phe 


Pro 


65 


Lys 


Ala 


Glu 


Phe 


Ala 


Glu 


Val 


Ser 


Lys 


Leu 


Val 


Thr 


Asp 


Leu 


Thr 


Lys 




225 










230 








235 










240 




Val 


His 


Thr 


Glu 


Cys 


Cys 


His 


Gly 


Asp 


Leu 


Leu 


Glu 


Cys 


Ala 


Asp 


Asp 



31 



BNSDOCID: <WO 9836085A1 I > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCT/US98/03068 













245 










250 










255 






Arg 


Ala 


Asp 


Leu 
260 


Ala 


Lys 


Tyr 


He 


Cys 
265 


Glu 


Asn 


Gin 


Asp 


Ser 
270 


He 


Ser 




Ser 


Lys 


Leu 


Lys 


Glu 


Cys 


Cys 


Glu 


Lys 


Pro 


Leu 


Leu 


Glu 


Lys 


Ser 


His 


5 






275 










280 










285 








Cys 


He 


Ala 


Glu 


Val 


Glu 


Asn 


Asp 


Glu 


Met 


Pro 


Ala 


Asp 


Leu 


Pro 


Ser 






290 










295 










300 










Leu 


Ala 


Ala 


Asp 


Phe 


Val 


Glu 


Ser 


Lys 


Asp 


Val 


Cys 


Lys 


Asn 


Tyr 


Ala 




305 










310 










315 










320 


10 


Glu 


Ala 


Lys 


Asp 


Val 


Phe 


Leu 


Gly 


Met 


Phe 


Leu 


Tyr 


Glu 


Tyr 


Ala 


Arg 












325 










330 










335 




Arg 


His 


Pro 


Asp 


Tyr 


Ser 


Val 


Val 


Leu 


Leu 


Leu 


Arg 


Leu 


Ala 


Lys 


Thr 










340 










345 










350 






Tyr 


Glu 


Thr 


Thr 


Leu 


Glu 


Lys 


Cys 


Cys 


Ala 


Ala 


Ala 


Asp 


Pro 


His 


Glu 


15 






355 










360 










365 










Cys 


Tyr 
370 


Ala 


Lys 


Val 


Phe 


Asp 
375 


Glu 


Phe 


Lys 


Pro 


Leu 
380 


Val 


Glu 


Glu 


Pro 




Gin 


Asn 


Leu 


He 


Lys 


Gin 


Asn 


Cys 


Glu 


Leu 


Phe 


Lys 


Gin 


Leu 


Gly 


Glu 




385 










390 










395 








400 


20 


Tyr 


Lys 


Phe 


Gin 


Asn 
405 


Ala 


Leu 


Leu 


Val 


Arg 
410 


Tyr 


Thr 


Lys 


Lys 


Val 
415 


Pro 




Gin 


Val 


Ser 


Thr 


Pro 


Thr 


Leu 


Val 


Glu 


Val 


Ser 


Arg 


Asn 


Leu 


Gly 


Lys 










420 










425 










430 




Val 


Gly 


Ser 


Lys 


Cys 


Cys 


Lys 


His 


Pro 


Glu 


Ala 


Lys 


Arg 


Met 


Pro 


Cys 


25 






435 










440 










445 








Ala 


Glu 


Asp 


Tyr 


Leu 


Ser 


Val 


Val 


Leu 


Asn 


Gin 


Leu 


Cys 


Val 


Leu 


His 






450 










455 










460 










Glu 


Lys 


Thr 


Pro 


Val 


Ser 


Asp 


Arg 


Val 


Thr 


Lys 


Cys 


Cys 


Thr 


Glu 


Ser 




465 










470 










475 










480 


30 


Leu 


Val 


Asn 


Arg 


Arg 


Pro 


Cys 


Phe 


Ser 


Ala 


Leu 


Glu 


Val 


Asp 


Glu 


Thr 












485 










490 








495 






Tyr 


Val 


Pro 


Lys 


Glu 


Phe 


Asn 


Ala 


Glu 


Thr 


Phe 


Thr 


Phe 


His 


Ala 


Asp 










500 










505 










510 






He 


Cys 


Thr 


Leu 


Ser 


Glu 


Lys 


Glu 


Arg 


Gin 


He 


Lys 


Lys 


Gin 


Thr 


Ala 


35 






515 










520 










525 










Leu 


Val 


Glu 


Leu 


Val 


Lys 


His 


Lys 


Pro 


Lys 


Ala 


Thr 


Lys 


Glu 


Gin 


Leu 






530 










535 








540 












Lys 


Ala 


Val 


Met 


Asp 


Asp 


Phe 


Ala 


Ala 


Phe 


Val 


Glu 


Lys 


Cys 


Cys 


Lys 




545 










550 










555 










560 


40 


Ala 


Asp 


Asp 


Lys 


Glu 


Thr 


Cys 


Phe 


Ala 


Glu 


Glu Gly Lys 


Lys 


Leu 


Val 












565 










570 










575 






Ala 


Ala 


Ser 


Gin 


Ala 


Ala 


Leu 


Gly 


Leu 
























580 








585 

















45 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1865 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



55 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: native coding sequence of mature HSA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



AGATGCACAC AAGAGTGAGG TTGCTCATCG GTTTAAAGAT TTGGGAGAAG AAAATTTCAA 6 0 

AGCCTTGGTG TTGATTGCCT TTGCTCAGTA TCTTCAGCAG TGTCCATTTG AAGATCATGT 120 

6 0 AAAATTAGTG AATGAAGTj\A CTGAATTTGC AAAAACATGT GTAGCTGATG AGTCAGCTGA 18 0 

AAATTGTGAC AAATCACTTC ATACCCTTTT TGGAGACAAA TTATGCACAG TTGCAACTCT 24 0 

TCGTGAAACC TATGGTGAAA TGGCTGACTG CTGTGCAAAA CAAGAACCTG AGAGAAATGA 3 00 

ATGCTTCTTG CAACACAAAG ATGACAACCC AAACCTCCCC CGATTGGTGA GACCAGAGGT 36 0 

TGATGTGATG TGCACTGCTT TTCATGACAA TGAAGAGACA TTTTTGAAAA AATACTTATA 42 0 

65 TGAAATTGCC AGAAGACATC CTTACTTTTA TGCCCCGGAA CTCCTTTTCT TTGCTAAAAG 4 80 

GTATAAAGCT GCTTTTACAG AATGTTGCCA AGCTGCTGAT AAAGCTGCCT GCCTGTTGCC 54 0 

AAAGCTCGAT GAACTTCGGG ATGAAGGGAA GGCTTCGTCT GCCAAACAGA GACTCAAATG 60 0 
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TGCCAGTCTC CAAAAATTTG GAGAAAGAGC TTTCAAAGCA TGGGCAGTGG CTCGCCTGAG 660 

CCAGAGATTT CCCAAAGCTG AGTTTGCAGA AGTTTCCAAG TTAGTGACAG ATCTTACCAA 72 0 

AGTCCACACG GAATGCTGCC ATGGAGATCT GCTTGAATGT GCTGATGACA GGGCGGACCT 78 0 

TGCCAAGTAT ATCTGTGAAA ATCAGGATTC GATCTCCAGT AAACTGAAGG AATGCTGTGA 84 0 

5 AAAACCTCTG TTGGAAAAAT CCCACTGCAT TGCCGAAGTG GAAAATGATG AGATGCCTGC 900 

TGACTTGCCT TCATTAGCTG CTGATTTTGT TGAAAGTAAG GATGTTTGCA AAAACTATGC 96 0 

TGAGGCAAAG GATGTCTTCC TGGGCATGTT TTTGTATGAA TATGCAAGAA GGCATCCTGA 102 0 

TTACTCTGTC GTGCTGCTGC TGAGACTTGC CAAGACATAT GAAACCACTC TAGAGAAGTG 1080 

CTGTGCCGCT GCAGATCCTC ATGAATGCTA TGCCAAAGTG TTCGATGAAT TTAAACCTCT 114 0 

10 TGTGGAAGAG CCTCAGAATT TAATCAAACA AAACTGTGAG CTTTTTAAGC AGCTTGGAGA 1200 

GTACAAATTC CAGAATGCGC TATTAGTTCG TTACACCAAG AAAGTACCCC AAGTGTCAAC 126 0 

TCCAACTCTT GTAGAGGTCT CAAGAAACCT AGGAAAAGTG GGCAGCAAAT GTTGTAAACA 1320 

TCCTGAAGCA AAAAGAATGC CCTGTGCAGA AGACTATCTA TCCGTGGTCC TGAACCAGTT 1380 

ATGTGTGTTG CATGAGAAAA CGCCAGTAAG TGACAGAGTC ACAAAATGCT GCACAGAGTC 1440 

15 CTTGGTGAAC AGGCGACCAT GCTTTTCAGC TCTGGAAGTC GATGAAACAT ACGTTCCCAA 1500 

AGAGTTTAAT GCTGAAACAT TCACCTTCCA TGCAGATATA TGCACACTTT CTGAGAAGGA 1560 

GAGACAAATC AAGAAACAAA CTGCACTTGT TGAGCTTGTG AAACACAAGC CCAAGGCAAC 1620 

AAAAGAGCAA CTGAAAGCTG TTATGGATGA TTTCGCAGCT TTTGTAGAGA AGTGCTGCAA 168 0 

GGCTGACGAT AAGGAGACCT GCTTTGCCGA GGAGGGTAAA AAACTTGTTG CTGCAAGTCA 174 0 

2 0 AGCTGCCTTA GGCTTATAAC ATCTACATTT AAAAGCATCT CAGCCTACCA TGAGAATAAG 1800 

AGAAAGAAAA TGAAGATCAA AAGCTTATTC ATCTGTTTTC TTTTTCGTTG GTGTAAAGCC 1860 

AACAC 1865 

(2) INFORMATION FOR SEQ ID NO;13: 

25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 352 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: protein 
(vxi) IMMEDIATE SOURCE: 

(B) CLONE: native proBPN' amino acid sequence 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 





Ala 


Gly 


Lys Ser Asn 


Gly 


Glu 


Lys 


Lys 


Tyr 


He 


Val 


Gly 


Phe 


Lys Gin 




1 




5 










10 










15 




Thr 


Met 


Ser Thr Met 


Ser 


Ala 


Ala 


Lys 


Lys 


Lys 


Asp 


Val 


He 


Ser Glu 


40 






20 








25 










30 






Lys 


Gly 


Gly Lys Val 


Gin 


Lys 


Gin 


Phe 


Lys 


Tyr 


Val 


Asp 


Ala 


Ala Ser 








35 






40 










45 








Ala 


Thr 


Leu Asn Glu 


Lys 


Ala 


Val 


Lys 


Glu 


Leu 


Lys 


Lys 


Asp 


Pro Ser 






50 






55 










60 








45 


Val 


Ala 


Tyr Val Glu 


Glu 


Asp 


His 


Val 


Ala 


His 


Ala 


Tyr 


Ala 


Gin Ser 




65 




70 










75 








80 




Val 


Pro 


Tyr Gly Val 


Ser 


Gin 


He 


Lys 


Ala 


Pro 


Ala 


Leu 


His 


Ser Gin 








85 










90 










95 




Gly Tyr 


Thr Gly Ser 


Asn 


Val 


Lys 


Val 


Ala 


Val 


He 


Asp 


Ser 


Gly He 


50 






100 








105 










110 






Asp 


Ser 


Ser His Pro 


Asp 


Leu 


Lys 


Val 


Ala 


Gly 


Gly 


Ala 


Ser 


Met Val 








115 






120 










125 








Pro 


Ser 


Glu Thr Asn 


Pro 


Phe 


Gin 


Asp 


Asn 


Asn 


Ser 


His 


Gly 


Thr His 






130 






135 










140 








55 


Val 


Ala 


Gly Thr Val 


Ala 


Ala 


Leu 


Asn 


Asn 


Ser 


He 


Gly 


Val 


Leu Gly 




14 5 




150 










155 








160 




Val 


Ala 


Pro Ser Ala 


Ser 


Leu 


Tyr 


Ala 


Val 


Lys 


Val 


Leu 


Gly Ala Asp 








165 










170 










175 




Gly 


Ser 


Gly Gin Tyr 


Ser 


Trp 


He 


He 


Asn Gly 


He 


Glu 


Trp 


Ala He 


60 






180 








185 










190 






Ala 


Asn 


Asn Met Asp 


Val 


lie 


Asn 


Met 


Ser 


Leu 


Gly 


Gly 


Pro 


Ser Gly 








195 






200 










205 








Ser 


Ala 


Ala Leu Lys 


Ala 


Ala 


Val 


Asp 


Lys 


Ala 


Val 


Ala 


Ser 


Gly Val 






210 






215 










220 








65 


Val 


Val 


Val Ala Ala 


Ala 


Gly 


Asn 


Glu 


Gly Thr 


Ser 


Gly 


Ser 


Ser Ser 




225 






230 








235 








240 




Thr 


Val 


Gly Tyr Pro 


Gly 


Lys 


Tyr 


Pro 


Ser 


Val 


He 


Ala 


val 


Gly Ala 
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245 250 255 





Val 


Asp 


Ser 


Ser 
260 


Asn 


Gin 


Arg 


Ala 


Ser 
265 


Phe 


Ser 


Ser 


Val 


Gly 
270 


Pro 


Glu 




Leu 


Asp 


Vai 


Met 


Ala 


Pro 


Gly 


Val 


Ser 


lie 


Gin 


Ser 


Thr 


Leu 


Pro 


Gly 


5 






O "7 C 










2 8 0 










28 5 








Asn 


Lys 
290 


Tyr 


vjxy 


Ai.a 


Tyr 


Asn 
295 


Gly 


Thr 


Ser 


Met 


Ala 
300 


Ser 


Pro 


His 


Val 




H.X ci 


oX y 


A J. a 


t\XcL 


AX a 


Leu 


lie 


Leu 


Ser 


Lys 


His 


Pro 


Asn 


Trp 


Thr 


Asn 




305 










310 










315 








320 


10 


Thr 


Gin 


Val 


Arg 


Ser 


Ser 


Leu 


Glu 


Asn 


Thr 


Thr 


Thr 


Lys 


Leu 


Gly 


Asp 












325 










330 










335 




Ser 


Phe 


Tyr 


Tyr 
340 


Gly 


Lys 


Gly 


Leu 


lie 
345 


Asn 


Val 


Gin 


Ala 


Ala 
350 


Ala 


Gin 



15 (2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1056 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: native proBPN' coding sequence 

25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GCAGGGAAAT CAAACGGGGA AAAGAAATAT ATTGTCGGGT TTAAACAGAC AATGAGCACG 60 

ATGAGCGCCG CTAAGAAGAA AGATGTCATT TCTGAAAAAG GCGGGAAAGT GCAAAAGCAA 120 

30 TTCAAATATG TAGACGCAGC TTCAGCTACA TTAAACGAAA AAGCTGTAAA AGAATTGAAA 18 0 

AAAGACCCGA GCGTCGCTTA CGTTGAAGAA GATCACGTAG CACATGCGTA CGCGCAGTCC 24 0 

GTGCCTTACG GCGTATCACA AATTAAAGCC CCTGCTCTGC ACTCTCAAGG CTACACTGGA 3 00 

TCAAATGTTA AAGTAGCGGT TATCGACAGC GGTATCGATT CTTCTCATCC TGATTTAAAG 360 

GTAGCAGGCG GAGCCAGCAT GGTTCCTTCT GAAACAAATC CTTTCCAAGA CAACAACTCT 420 

35 CACGGAACTC ACGTTGCCGG CACAGTTGCG GCTCTTAATA ACTCAATCGG TGTATTAGGC 480 

GTTGCGCCAA GCGCATCACT TTACGCTGTA AAAGTTCTCG GTGCTGACGG TTCCGGCCAA 54 0 

TACAGCTGGA TCATTAACGG AATCGAGTGG GCGATCGCAA ACAATATGGA CGTTATTAAC 600 

ATGAGCCTCG GCGGACCTTC TGGTTCTGCT GCTTTAAAAG CGGCAGTTGA TAAAGCCGTT 66 0 

GCATCCGGCG TCGTAGTCGT TGCGGCAGCC GGTAACGAAG GCACTTCCGG CAGCTCAAGC 72 0 

40 ACAGTGGGCT ACCCTGGTAA ATACCCTTCT GTCATTGCAG TAGGCGCTGT TGACAGCAGC 780 

AACCAAAGAG CATCTTTCTC AAGCGTAGGA CCTGAGCTTG ATGTCATGGC ACCTGGCGTA 84 0 

TCTATCCAAA GCACGCTTCC TGGAAACAAA TACGGGGCGT ACAACGGTAC GTCAATGGCA 900 

TCTCCGCACG TTGCCGGAGC GGCTGCTTTG ATTCTTTCTA AGCACCCGAA CTGGACAAAC 96 0 

ACTCAAGTCC GCAGCAGTTT AGATU^CACC ACTACAAAAC TTGGTGATTC TTTCTACTAT 102 0 

45 GGAAAAGGGC TGATCAACGT ACAGGCGGCA GCTCAG 1056 

(2) INFORMATION FOR SEQ ID NO: 15; 

(i) SEQUENCE CHARACTERISTICS: 
50 (A) LENGTH: 77 amino acids . 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
55 (vii) IMMEDIATE SOURCE: 

(B) CLONE: subtilisin BPN' pro-peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l5: 



Ala 


Gly Lys Ser 


Asn 


Gly Glu 


Lys 


Lys 


Tyr 


He 


Val 


Gly 


Phe 


Lys 


Gin 


1 




5 








10 










15 




Thr 


Met Ser Thr 


Met 


Ser Ala 


Ala 


Lys 


Lys 


Lys 


Asp 


Val 


He 


Ser 


Glu 




20 








25 










30 






Lys 


Gly Gly Lys 


Val 


Gin Lys 


Gin 


Phe 


Lys 


Tyr 


Val 


Asp 


Ala 


Ala 


Ser 




35 






40 










45 








Ala 


Thr Leu Asn 


Glu 


Lys Ala 


Val 


Lys 


Glu 


Leu 


Lys 


Lys 


Asp 


Pro 


Ser 



50 55 60 
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Val Ala Tyr Val Glu Glu Asp His Val Ala His Ala Tvr 
65 70 75 

5 (2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 275 amino acids 

(B) TYPE: amino acid 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: native mature BPN' amino acid sequence 



15 



20 



(Xi> SEQUENCE DESCRIPTION: SEQ ID NO:16: 



15 10 
His Ser Gin Gly Tyr Thr Gly Ser Asn Val 

20 25 
Ser Gly lie Asp Ser Ser His Pro Asp Leu 

35 40 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
25 50 55 

Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 

Val Leu Gly Val Ala Pro Ser Ala Ser Leu 
85 90 

3 0 Gly Ala Asp Gly Ser Gly Gin Tyr Ser Trp 

100 105 
Trp Ala lie Ala Asn Asn Met Asp Val He 

115 120 
Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala 
35 130 135 

Ser Gly Val Val Val Val Ala Ala Ala Gly 
145 150 

Ser Ser Ser Thr Val Gly Tyr Pro Gly Lys 
165 170 

4 0 Val Gly Ala Val Asp Ser Ser Asn Gin Arg 

180 185 
Gly Pro Glu Leu Asp Val Met Ala Pro Gly 

195 200 
Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn 
45 210 215 

Pro His Val Ala Gly Ala Ala Ala Leu He 
225 230 

Trp Thr Asn Thr Gin Val Arg Ser Ser Leu 
245 250 
50 Leu Gly Asp Ser Phe Tyr Tyr Gly Lys- Gly 

260 265 270 

Ala Ala Gin 
275 

55 (2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 275 amino acids 

(B) TYPE: amino acid 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: amino acid sequence of mature BPN' variant 

o 5 



NO: 


16: 










lie 


Lys 


Ala 


Pro 


Ala 


Leu 










15 




Lys 


Val 


Ala 


Val 


He 


Asp 








30 




Lys 


Val 


Ala 


Gly 


Gly 


Ala 






45 








Gin Asp Asn 


Asn 


Ser 


His 




60 










Leu 


Asn 


Asn 


Ser 


He 


Gly 


75 










80 


Tyr Ala 


Val 


Lys 


Val 


Leu 










95 




He 


He 


Asn 


Gly 


He 


Glu 








110 






Asn 


Met 


Ser 


Leu 


Gly 


Gly 






125 








Val 


Asp 


Lys 


Ala 


Val 


Ala 




140 










Asn 


Glu 


Gly 


Thr 


Ser 


Gly 


155 










160 


Tyr 


Pro 


Ser 


Val 


He 


Ala 










175 




Ala 


Ser 


Phe 


Ser 


Ser 


Val 








190 






Val 


Ser 


He 


Gin 


Ser 


Thr 






205 








Gly Thr 


Ser 


Met 


Ala 


Ser 




220 










Leu 


Ser 


Lys 


His 


Pro 


Asn 


235 










240 


Glu 


Asn 


Thr 


Thr 


Thr 


Lys 










255 


Leu 


He 


Asn 


Val 


Gin 


Ala 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 





Ala 


Gin 


S62r 


Val 


It X 


lyr 


oxy 


Vo X 


Ser 


vjfXn 


T 1 o 

xxe 


Lys 


Ala 


Pro 


Ala 


Leu 


5 


1 








5 










X u 










lo 






His 


Ser 


Gin 


Gly 


Lyr 


Thr* 

X liX 


oxy 


oex 


Asn 


vax 


Lys 


vax 


Axa 


val 


He 


Asp 










20 




















30 






OCX 


ox y 


u. X c 


Asp 


Ser 


oer 


rlxs 


Pro 


Asp 


Leu 


Lys 


Val 


Ala 


Gly 


Gly 


Ala 








35 










40 










45 






1 n 
X u 


Q T" 


50 


VciX 


PlfO 


Ser 


oXU 


1 nr 

DO 


Asn 


Pro 


Phe 


Gin 


Asp 
60 


Thr 


Asn 


Ser 


His 






1 niT 


MXS 


vai 


Ala 


Gly 


Thr 


Val 


Ala 


Ala 


Leu 


Thr 


Asn 


Ser 


He 


Gly 




65 










70 










75 










80 




Vet X 




ijJiy 


va J. 


Aia 


Pro 


Ser 


Ala 


Ser 


Leu 


Tyr Ala 


Val 


Lys 


Val 


Leu 


15 










O 3 


















95 








Ala 




ux y 
100 


Ser 


oxy 


oxn 


Tyr 


Ser 

X 


Trp 


lie 


He 


Asn 


Gly 
xxu 


He 


Glu 






Ala 


X X c 


AT a 

nXct 


Asn 


Asn 




Asp 


vax 


xxe 


Thr 


Met 


Ser 


Ueu 


Gly 


Gly 








115 










120 










125 




9 n 




OCX. 

130 


vjx y 


C o V 

OCX 


at a 

/"iX d 


/\x a 


Leu 
135 


Lys 


AX a 


AX a 


Val 


Asp 
140 


Xiys 


TV T 

Ala 


Val 


Ala 




Q o 'V 
OCX. 


V3X y 


V dx 


V dX 


V a.X 


\7a 1 

vax 


/\xa 


Axa 


Axa 


uxy 


Asn 


Glu 


CjXy 


Thr 


Ser 


Gly 




X *± z> 










X -3 u 










155 










16 0 








sgit 


1 nx 


vai 


Qjiy 


Tyr 


Pro 


Gly 


Lys 


Tyr 


Pro 


Ser 


Val 


He 


Ala 


•> 

<£> O 










XO ID 










17 0 










175 






V Ct .1. 


o JL y 


a 1 

i-iXa 


X o u 


Asp 


Ser 


oer 


Asn 


vjXn 

IDC 

X o 5 


Arg 


Ala 


Ser 


Phe 


Ser 
190 


Ser 


Val 




o± y 


PlTO 


ox u 
195 


Leu. 


Asp 


\7=i T 

vax 


Met 


Ala 
200 


Pro 


Gly 


Val 


Ser 


He 
205 


Gin 


Ser 


Thr 




XjCU 


Xr X U 


ox y 


Asn 


Lys 


Tyr 


v»xy 


TV 1 a 

AX a 


Tyr 


Ser 


Gly Thr 


Ser 


Met 


Ala 


Ser 






210 










215 










220 












Pro 


His 


Val 


Ala 


Gly 


Ala 


Ala 


Ala 


Leu 


lie 


Leu 


Ser 


Lys 


His 


Pro 


Thr 




225 










230 










235 








240 




Trp 


Thr 


Asn 


Thr 


Gin 


Val 


Arg 


Ser 


Ser 


Leu 


Glu 


Asn 


Thr 


Thr 


Thr 


Lys 


35 










245 










250 










255 




Leu 


Gly 


Asp 


Ser 
260 


Phe 


Tyr 


Tyr 


Gly 


Lys 
265 


Gly 


Leu 


He 


Asn 


Val 
270 


Gin 


Ala 



Ala Ala Gin 



275 

40 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1260 base pairs 
45 (B) TYPE: nucleic acid 

(C) STRANBEDNESS : single 

(D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 
50 (B) CLONE: codon-optimized- 3D signal peptide-AAT DNA secfuence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

ATGAAGAACA CCTCCTCCCT CTGCCTCCTG CTGCTCGTGG TCCTCTGCTC CCTGACCTGC 6 0 

55 AACAGCGGCC AGGCCGAGGA CCCGCAGGGC GACGCCGCCC AGAAGACCGA CACCAGCCAC 12 0 

CACGACCAGG ACCACCCGAC GTTCAACAAG ATCACCCCGA ATTTGGCCGA ATTCGCCTTC 180 

AGCCTGTACC GCCAGCTCGC GCACCAGTCC AACTCCACCA ACATCTTCTT CAGCCCGGTG 24 0 

AGCATCGCCA CCGCCTTCGC CATGCTGTCC CTGGGTACCA AGGCGGACAC CCACGACGAG 3 00 

ATCCTCGAAG GGCTGAACTT CAACCTGACG GAGATCCCGG AGGCGCAGAT CCACGAGGGC 36 0 

6 0 TTCCAGGAGC TGCTCAGGAC GCTCAACCAG CCGGACTCCC AGCTCCAGCT CACCACCGGC 42 0 

AACGGGCTCT TCCTGTCCGA GGGCCTCAAG CTCGTCGATA AGTTCCTGGA GGACGTGAAG 480 

AAGCTCTACC ACTCCGAGGC GTTCACCGTC AACTTCGGGG ACACCGAGGA GGCCAAGAAG 54 0 

CAGATCAACG ACTACGTCGA GAAGGGGACC CAGGGCAAGA TCGTGGACCT GGTCAAGGAA 600 

TTGGACAGGG ACACCGTCTT CGCGCTCGTC AACTACATCT TCTTCAAGGG CAAGTGGGAG 660 

65 CGCCCGTTCG AGGTGAAGGA CACCGAGGAG GAGGACTTCC ACGTCGACCA GGTCACCACC 72 0 

GTCAAGGTCC CGATGATGAA GAGGCTCGGC ATGTTCAACA TCCAGCACTG CAAGAAGCTC 78 0 

TCCAGCTGGG TGCTCCTCAT GAAGTACCTG GGGAACGCCA CCGCCATCTT CTTCCTGCCG 84 0 
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GACGAGGGCA AGCTCCAGCA CCTGGAGAAC GAGCTGACGC ACGACATCAT CACGAAGTTC 900 

CTGGAGAACG AGGACAGGCG CTCCGCTAGC CTCCACCTCC CGAAGCTGAG CATCACCGGC 96 0 

ACGTACGACC TGAAGAGCGT GCTGGGCCAG CTGGGCATCA CGAAGGTCTT CAGCAACGGC 102 0 

GCGGACCTCT CCGGCGTGAC GGAGGAGGCC CCCCTGAAGC TCTCCAAGGC CGTGCACAAG 108 0 

5 GCGGTGCTCA CGATCGACGA GAAGGGGACG GAAGCTGCCG GGGCCATGTT CCTGGAGGCC 114 0 

ATCCCCATGT CCATCCCGCC CGAGGTCAAG TTCAACAAGC CCTTCGTCTT CCTGATGATC 1200 

GAGCAGAACA CGAAGAGCCC CCTCTTCATG GGGAAGGTCG TCAACCCCAC GCAGAAGTGA 1260 

(2) INFORMATION FOR SEQ ID NO: 19: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 82 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
15 (D) TOPOLOGY: linear 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: codon- optimized 3D signal peptide-ATIIX DNA sequen 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATGAAGAACA CCTCCTCCCT CTGCCTCCTG CTGCTCGTGG TCCTCTGCTC CCTGACCTGC 6 0 

AACAGCGGCC AGGCCCACGG AAGCCCTGTG GACATCTGCA CAGCCAAGCC GCGGGACATT 120 

CCCATGAATC CCATGTGCAT TTACCGCTCC CCGGAGAAGA AGGCAACTGA GGATGAGGGC 180 

25 TCAGAACAGA AGATCCCGGA GGCCACCAAC CGGCGTGTCT GGGAACTGTC CAAGGCCAAT 240 

TCCCGCTTTG CTACCACTTT CTATCAGCAC CTGGCAGATT CCAAGAATGA CAATGATAAC 3 00 

ATTTTCCTGT CACCCCTGAG TATCTCCACG GCTTTTGCTA TGACCAAGCT GGGTGCCTGT 36 0 

AATGACACCC TCCAGCAACT GATGGAGGTA TTTAAGTTTG ACACCATATC TGAGAAAACA 4 20 

TCTGATCAGA TCCACTTCTT CTTTGCCAAA CTGAACTGCC GACTCTATCG AAAAGCCAAC 48 0 

30 AAATCCTCCA AGTTAGTATC AGCCAATCGC CTTTTTGGAG ACAAATCCCT TACCTTCAAT 540 

GAGACCTACC AGGACATCAG TGAGTTGGTA TATGGAGCCA AGCTCCAGCC CCTGGACTTC 6 00 

AAGGAAAATG CAGAGCAATC CAGAGCGGCC ATCAACAAAT GGGTGTCCAA TAAGACCGAA 660 

GGCCGAATCA CCGATGTCAT TCCCTCGGAA GCCATCAATG AGCTCACTGT TCTGGTGCTG 720 

GTTAACACCA TTTACTTCAA GGGCCTGTGG AAGTCAAAGT TCAGCCCTGA GAACACAAGG 78 0 

3 5 AAGGAACTGT TCTACAAGGC TGATGGAGAG TCGTGTTCAG CATCTATGAT GTACCAGGAA 84 0 

GGCAAGTTCC GTTATCGGCG CGTGGCTGAA GGCACCCAGG TGCTTGAGTT GCCCTTCAAA 900 

GGTGATGACA TCACCATGGT CCTCATCTTG CCCAAGCCTG AGAAGAGCCT GGCCAAGGTG 96 0 

GAGAAGGAAC TCACCCCAGA GGTGCTGCAG GAGTGGCTGG ATGAATTGGA GGAGATGATG 102 0 

CTGGTGGTTC ACATGCCCCG CTTCCGCATT GAGGACGGCT TCAGTTTGAA GGAGCAGCTG 108 0 

4 0 CAAGACATGG GCCTTGTCGA TCTGTTCAGC CCTGAAAAGT CCAAACTCCC AGGTATTGTT 114 0 

GCAG7VAGGCC GAGATGACCT CTATGTCTCA GATGCA.TTCC ATAAGGCATT TCTTGAGGTA 12 00 

AATGAAGAAG GCAGTGAAGC AGCTGCAAGT ACCGCTGTTG TGATTGCTGG CCGTTCGCTA 1260 

AACCCCAACA GGGTGACTTT CAAGGCCAAC AGGCCCTTCC TGGTTTTTAT AAGAGAAGTT 13 2 0 

CCTCTGAACA CTATTATCTT CATGGGCAGA GTAGCCAACC CTTGTGTTAA GTAACTCGAG 13 80 

45 CC 1382 

(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 
50 (A) LENGTH: 1940 base pairs. 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

55 (vii) IMMEDIATE SOURCE: 

(B) CLONE: codon- optimized 3D signal peptide-HSA DNA sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

60 ATGAAGAACA CCTCCTCCCT CTGCCTCCTG CTGCTCGTGG TCCTCTGCTC CCTGACCTGC 6 0 

AACAGCGGCC AGGCCAGATG CACACAAGAG TGAGGTTGCT CATCGGTTTA AAGATTTGGG 12 0 

AGAAGAAAAT TTCAAAGCCT TGGTGTTGAT TGCCTTTGCT CAGTATCTTC AGCAGTGTCC 180 

ATTTGAAGAT CATGTAAAAT TAGTGAATGA AGTAACTGAA TTTGCAAAAA CATGTGTAGC 24 0 

TGATGAGTCA GCTGAAAATT GTGACAAATC ACTTCATACC CTTTTTGGAG ACAAATTATG 3 00 

65 CACAGTTGCA ACTCTTCGTG AAACCTATGG TGJ\AATGGCT GACTGCTGTG CAAAACAAGA 360 

ACCTGAGAGA AATGAATGCT TCTTGCAACA CAAAGATGAC AACCCAAACC TCCCCCGATT 420 

GGTGAGACCA GAGGTTGATG TGATGTGCAC TGCTTTTCAT GACAATGAAG AGACATTTTT 4 80 
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GAAAAAATAC TTATATGAAA TTGCCAGAAG ACATCCTTAC TTTTATGCCC CGGAACTCCT 54 0 

TTTCTTTGCT AAAAGGTATA AAGCTGCTTT TACAGAATGT TGCCAAGCTG CTGATAAAGC 6 00 

TGCCTGCCTG TTGCCAAAGC TCGATGAACT TCGGGATGAA GGGAAGGCTT CGTCTGCCAA G6 0 

ACAGAGACTC AAATGTGCCA GTCTCCAAAA ATTTGGAGAA AGAGCTTTCA AAGCATGGGC 72 0 

5 AGTGGCTCGC CTGAGCCAGA GATTTCCCAA AGCTGAGTTT GCAGAAGTTT CCAAGTTAGT 78 0 

GACAGATCTT ACCAAAGTCC ACACGGAATG CTGCCATGGA GATCTGCTTG AATGTGCTGA 84 0 

TGACAGGGCG GACCTTGCCA AGTATATCTG TGAAAATCAG GATTCGATCT CCAGTAAACT 9 00 

GAAGGAATGC TGTGAAAAAC CTCTGTTGGA AAAATCCCAC TGCATTGCCG AAGTGGAAAA 96 0 

TGATGAGATG CCTGCTGACT TGCCTTCATT AGCTGCTGAT TTTGTTGAAA GTAAGGATGT 102 0 

10 TTGCAAAAAC TATGCTGAGG CAAAGGATGT CTTCCTGGGC ATGTTTTTGT ATGAATATGC 1080 

AAGAAGGCAT CCTGATTACT CTGTCGTGCT GCTGCTGAGA CTTGCCAAGA CATATGAAAC 114 0 

CACTCTAGAG AAGTGCTGTG CCGCTGCAGA TCCTCATGAA TGCTATGCCA AAGTGTTCGA 12 00 

TGAATTTAAA CCTCTTGTGG AAGAGCCTCA GAATTTAATC AAACAAAACT GTGAGCTTTT 1260 

TAAGCAGCTT GGAGAGTACA AATTCCAGAA TGCGCTATTA GTTCGTTACA CCAAGAAAGT 1320 

15 ACCCCAAGTG TCAACTCCAA CTCTTGTAGA GGTCTCAAGA AACCTAGGAA AAGTGGGCAG 1380 

CAAATGTTGT AAACATCCTG AAGCAAAAAG AATGCCCTGT GCAGAAGACT ATCTATCCGT 1440 

GGTCCTGAAC CAGTTATGTG TGTTGCATGA GAAAACGCCA GTAAGTGACA GAGTCACAAA 1500 

ATGCTGCACA GAGTCCTTGG TGAACAGGCG ACCATGCTTT TCAGCTCTGG AAGTCGATGA 1560 

AACATACGTT CCCAAAGAGT TTAATGCTGA AACATTCACC TTCCATGCAG ATATATGCAC 1620 

20 ACTTTCTGAG AAGGAGAGAC AAATCAAGAA ACAAACTGCA CTTGTTGAGC TTGTGAAACA 1680 

CAAGCCCAAG GCAACAAAAG AGCAACTGAA AGCTGTTATG GATGATTTCG CAGCTTTTGT 174 0 

AGAGAAGTGC TGCAAGGCTG ACGATAAGGA GACCTGCTTT GCCGAGGAGG GTAAAAAACT 1800 

TGTTGCTGCA AGTCAAGCTG CCTTAGGCTT ATAACATCTA CATTTAAAAG CATCTCAGCC 1860 

TACCATGAGA ATAAGAGAAA GAAAATGAAG ATCAAAAGCT TATTCATCTG TTTTCTTTTT 1920 

25 CGTTGGTGTA AAGCCAACAC 1940 

(2) INFORMATION FOR SEQ ID NO: 21; 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 1140 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

3 5 (vii) IMMEDIATE SOURCE: 

(B) CLONE: codon- optimized 3D signal peptide-BPN' DNA sequene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

4 0 ATGAAGAACA CCTCCTCCCT CTGCCTCCTG CTGCTCGTGG TCCTCTGCTC CCTGACCTGC 6 0 

AACAGCGGCC AGGCCGCTGG CAAGAGCAAC GGGGAGAAGA AGTACATCGT CGGCTTCAAG 12 0 

CAGACCATGA GCACCATGAG CGCCGCCAAG AAGAAGGACG TCATCAGCGA GAAGGGCGGC 180 

AAGGTACAGA AGCAGTTCAA GTACGTGGAC GCCGCCAGCG CCACCCTCAA CGAGAAGGCC 24 0 

GTCAAGGAGC TGAAGAAGGA CCCGAGCGTC GCCTACGTCG AGGAGGACCA CGTCGCCCAC 300 

4 5 GCATATGCAC AGAGCGTCCC GTACGGCGTC AGCCAGATCA AGGCCCCGGC CCTCCACAGC 36 0 

CAGGGCTACA CCGGCAGCAA CGTCAAGGTC GCCGTCATCG ACAGCGGCAT CGACAGCAGC 42 0 

CACCCGGACC TCAAGGTCGC CGGCGGAGCT AGCATGGTCC CGAGCGAGAC CAACCCGTTC 4 80 

CAGGACACCA ACAGCCATGG CACCCACGTC GCCGGCACCG TCGCCGCCCT CACCAACAGC 54 0 

ATCGGCGTCC TCGGCGTCGC CCCGAGCGCC AGCCTCTACG CCGTCAAGGT ACTCGGCGCC 600 

50 GACGGCAGCG GCCAGTACAG CTGGATCATC AACGGCATCG AGTGGGCCAT CGCCAACAAC 66 0 

ATGGACGTCA TCACCATGAG CCTCGGCGGC CCGAGCGGCA GCGCCGCCCT CAAGGCCGCC 720 

GTCGACAAGG CCGTCGCCAG CGGCGTCGTC GTCGTCGCCG CCGCCGGCAA CGAGGGCACC 780 

AGCGGCAGCA GCAGCACCGT CGGCTACCCG GGCAAGTACC CGAGCGTCAT CGCCGTCGGC 84 0 

GCCGTGGACA GCAGCAACCA GCGCGCGAGC TTCAGCAGCG TCGGCCCGGA GCTGGACGTC 900 

55 ATGGCCCCGG GCGTCAGCAT CCAGAGCACC CTCCCGGGCA ACAAGTACGG CGCCTACAGC 960 

GGCACCAGCA TGGCCAGCCC GCACGTCGCC GGCGCCGCTG CACTCATCCT CAGCAAGCAC 102 0 

CCGACCTGGA CCAACACCCA GGTCCGCAGC AGCCTGGAGA ACACCACCAC CAAGCTCGGC 1080 

GACAGCTTCT ACTACGGCAA GGGCCTCATC AACGTCCAGG CCGCCGCCCA GTGACTCGAG 1140 

60 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
65 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 
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45 



60 



(vii) IMMEDIATE SOURCE: 

IB) CLONE: N- terminus of mature AAT 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Glu Asp Pro Gin Gly Asp Ala Ala Gin Lys Thr Asp Thr 
15 10 

(2) INFORMATION FOR SEQ ID NO: 23: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
15 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

20 GCTTGACCTG TAACTCGGGC CAGGCGAGCT 30 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CGCCTAGCCC GAGTTACAGG TCAAGCAGCT 30 
35 (2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 7 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
AGCTCCATGG CCGTGGCTCG AGTCTAGACG CGTCCCC 37 
(2) INFORMATION FOR SEQ ID NO: 26: 

50 (i) SEQUENCE CHARACTERISTICS:- 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inea r 

55 

(xi) SEQUENCE DESCRIPTION: SEQ < ID NO:26: 
GGGGACGCGT CTAGACTCGA GCCACGGCCA TGG 33 
(2) INFORMATION FOR SEQ ID NO: 27: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 5 base pairs 
65 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GCATGCAGGT GCTGAACACC ATGGTGAACA AACAC 35 

5 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

TTCTTGTCCC TTTCGGTCCT CATCGTCCTC CT 32 
(2) INFORMATION FOR SEQ ID NO: 29: 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

30 TGGCCTCTCC TCCAACTTGA CAGCCGGGAG CT 32 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TTCACCATGG TGTTCAGCAC CTGCATGCTG CA 32 
4 5 (2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

55 

CGATGAGGAC CGAAAGGGAC AAGAAGTGTT TO 32 
(2) INFORMATION FOR SEQ ID NO: 32: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

65 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
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15 



CCCGGCTGTC AAGTTGGAGG AGAGGCCAAG GAGGA 35 
(2) INFORMATION FOR SEQ ID NO : 3 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GAGGATCCCC AGGGAGATGC TGCCCAGAA 29 
(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 34 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (xi) SEQXJENCE DESCRIPTION: SEQ ID NO: 34: 

CGCGCTCGAG TTATTTTTGG GTGGGATTCA CCAC 34 
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1. A method of producing, in monocot plant cells, a mature heterologous protein selected 
from the group consisting of 

5 (i) mature, glycosylated otrantitrypsin (AAT) having the same N-terminal amino acid 

sequence as mature AAT produced in humans and a glycosylation pattern which increases serum 

halflife substantially over that of mature non-glycosylated AAT; 

(ii) mature, glycosylated antithrombin HI (ATIII) having the same N-terminal amino acid 

sequence as mature ATIII produced in humans; 
10 (iii) mature human serum albumin (HSA) having the same N-terminal amino acid sequence 

as mature HSA produced in humans and having the folding pattern of native mature HSA as 

evidenced by its bllirubin-binding characteristics; and 

(iv) mature, active subtilisin BPN* (BPN*) having the same N-terminal amino acid sequence 

as BPN* produced in Bacillus; 
15 the method comprising: 

(a) obtaining monocot cells transformed with a chimeric gene having (i) a monocot 
transcriptional regulatory region, inducible by addition or removal of a small molecule, or during 
seed maturation, (ii) a first DNA sequence encoding the heterologous protein, and (iii) a second 
DNA sequence encoding a signal peptide, said first and second DNA sequences in translation-frame 

20 and encoding a fusion protein, and wherein (i) the transcriptional regulatory region is operably 
linked to the second DNA sequence, and (ii) said signal peptide is effective to facilitate secretion of 
the mature heterologous protein from the transformed cells; 

(b) cultivating the transformed cells under conditions effective to induce said transcriptional 
regulatory region, thereby promoting expression of the fusion protein and secretion of the mature 

25 heterologous protein from the transformed cells; and 

(c) isolating said mature heterologous protein produced by the transformed cells. 

2. The method of claim 1, wherein said first DNA sequence encodes proBPN*, said 
cultivating includes cultivating said transformed cells at a pH between 5-6 to promote expression 

30 and secretion of proBPN' from the cells, and said isolating step includes incubating the proBPN* 
under conditions effective to allow the autoconversion of proBPN' to active mature BPN*. 



3. The method of claim 1, wherein said first DNA sequence encodes mature BPN', 
and said method further includes: 
35 transforming said cells with a second chimeric gene containing (i) a transcriptional 

35 
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regulatory region inducible by addition or removal of a small molecule, or during seed maturation, 
(ii) a third DNA sequence encoding the pro-peptide moiety of BPN', and (iii) a fourth DNA 
sequence encoding a signal polypeptide, where said fourth DNA sequence is operably linked to said 
transcriptional regulatory region and said third DNA sequence, and where said signal polypeptide is 
5 in translation-firame with said pro-peptide moiety and is effective to facilitate secretion of expressed 
pro-peptide moiety from the transformed cells; 

said cultivating step includes cultivating the transformed cells at a pH between 5-6 to 
promote expression and secretion of BPN* and the pro-peptide moiety from the cells; 

and said isolating step includes incubating the BPN' and the pro-moiety under conditions 
10 effective to allow the conversion of BPN' to active mature BPN', and isolating the active mature 
BPN\ 



4. The method of claim 1, wherein said signal peptide is the RAmy3D signal peptide having 
ttie amino acid sequence identified by SEQ ID N0:1. 

15 

5. The method of claim 1, wherein said second DNA sequence encodes the RAmy3D signal 
peptide (SEQ ID NO:l) and has the codon-optimized nucleotide sequence identified by SEQ ID 
N0:3, 



20 6. The method of claim 1, wherein said signal peptide is the RAmylA signal peptide having 

the amino acid sequence identified by SEQ ID N0:4. 

7. The method of claim 1, wherein the second DNA sequence, the first DNA sequence, or 
both the second and the first DNA sequence, is codon-optimized for enhanced expression in said 

IS plant* 

8. The method of claim 1, wherein said transcriptional regulatory region is a promoter 
derived from a rice or barley a-amylase gene selected from the group consisting of the RAmylA, 
RAmylB, RAmy2A, RAmySA, RAmySB, RAmy3C, RAmy3D, and RAmy3E, pM/C, gKAmyl41, 

30 gKAmyl55, Amy32b, and HV18 genes. 

9. The method of claim 8, wherein the chimeric gene further comprises, between said 
transcriptional regulatory region and said second DNA coding sequence, the 5' untranslated region 
of an inducible monocot gene selected from the group consisting of RAmylA, RAmy3B, RAmy3C, 

35 RAmy3D, HV18, and RAmy3E, 

36 
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10. The method of claim 8, wherein said chimeric gene further comprises, downstream of 
the sequence encoding said fusion protein, the 3' untranslated region of an inducible monocot gene 
derived from a rice or barley a-an^ylase gene selected from the group consisting of the RAmylA, 

5 RAmylB, RAmy2A, RAmySA, RAmy3B, RAmy3C, RAmy3D, and RAmy3E, pM/C, gKAmyl41, 
gKAmyl55, Amy32b, and HV18 genes. 

11. The method of claim 1, wherein said cultivating includes culturing the transformed plant 
cells in a sugar-free or sugar-depleted medium, the transcriptional regulatory region is derived from 

10 the RAmy3E or RAmy3D gene, the 5* untranslated region is derived from the RAmylA gene and 
has the sequence identified by SEQ ID NO:S, and the 3' untranslated region is derived from the 
RAmylA gene. 

12. The method of claim 1, wherein the transformed cells are aleurone cells of mature 
15 seeds, the transcriptional regulatory region is upregulated by addition of a small molecule to 

promote seed germination, and said cultivating includes germinating said seeds, either in 
embryonated or de-embryonated form. 

13. The method of claim 12, wherein the transcriptional regulatory region is a rice a* 
20 amylase RAmylA promoter or a barley HV18 promoter, and said small molecule is gibberellic acid. 

14. A mature heterologous protein produced by the method of claim 1, wherein said protein 
is selected from the group consisting of: 

(i) mature glycoslyated ar^^^i^YPsin (AAT) having the same N-terminal amino acid 
25 sequence as mature AAT produced in humans and having a glycosylation pattern which increases 

serum halflife substantially over that of non-glycosylated mature AAT; 

(ii) mature glycosylated antithrombin III (ATIII) having the same N-terminal amino acid 
sequence as mature ATIII produced in humans; and 

(iii) mature glycosylated subtilisin BPN' (BPN') having the same N-terminal amino acid 
30 sequence as BPN' produced in Bacillus; 

wherein said protein has a glycosylation pattern characteristic of proteins produced in said 
monocot plant. 

15. The method of claim 1, wherein said monocot plant cells are transformed rice, barley, 
35 com, wheat, oat, rye, sorghum, or millet cells. 

37 
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16. The method of claim 1, wherein said monocot plant cells are transformed rice or barley 

cells. 

5 17. Plant cells capable of producing the mature heterologous protein according to the 

method of claim 1, wherein said cultivating includes culturing the transformed plant cells in a sugar- 
free or sugar-depleted medium, the transcriptional regulatory region is derived from the RAmy3E or 
RAmy3D gene, the 5* untranslated region is derived from the RAmylA gene and has the sequence 
identified by SEQ ID N0:5, and the 3' untranslated region is derived from the RAmylA gene. 

10 

18. Seeds capable of producing the mature heterologous protein according to the method of 
claim 1, wherein said transformed cells are aleurone cells, the transcriptional regulatory region is 
upregulated by addition of a small molecule to promote seed germination, and said cultivating 
includes germinating said seeds, either in embryonated or de-embryonated form. 

15 



38 



BNSDCKiD: two seseoesAi i > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCT/US98/03068 



1/11 



c 

CO 

Q 



O CJ ^ 

o o < 

< < .-H 

DUO 

o o >, 
o e) ^ 
o u o 



S3 

o o 

H £h U 



cn 
c 

(0 



u u 

O (J) 

H H 

H U 

U O 

CJ> u 

< H 

u u 

o o 

O U 

U U 



u 



o u ^ 

o o > 

o p ^ 

C9 O > 

P U 

H E-i tt) 

O O J 

O CD =J 

U O J 

O U -J 

<5 U d 

U .J 



to 
u 



< H W 

< H to 



ss 



c 

to 

0) 



<D 

o 
c 

0) cr 



o e 



a u 

O fO 



I 

u c 
J o 

o o 



-H 




CM 



SCXXrO: <WO 9e36085A1 I > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCTAJS98/03068 



2/11 



p3D V 1.0 




NCOl 

I 

Sad I 
I I 

EC1136II 
I I I 
i I I 
1590 I 

1 * I 



Xbal 



BstXI 



1600 



Xhol 



Mlul 



MCS 



1610 



Smal 



1620 



6C GAGCT C CATG GCCGT GG CTC GAG TC TAGAC GCGTC CCGGG 
CGCTCGA GGTAC CGGCA CCGAG CTCAG ATCTG CGCAG GGCCC 
Ala 



Sac I blunted with 
^ T4 DNA Polymerase 

Xhol 



native AAT PGR product 



Xhol 



Fig. 3A 



BNSDOCID: <WO 9836095A1 I > 



SUBSTITUTE SHEET (RULE 25) 



wo 98/36085 



PCT/US98/03068 



P3D.AAT 
7195 bp 



Rsr 




Fig. 3B 



DOClD:<WO 98360a5Al I > 



SUBSTITUTE SHEET (RULE 26) 



- wo 98/36085 



PCTAJS98/03068 




f 

Fig. 4 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCrD;<WO 9836085A1 I 



wo 98/3608S 



PCT/US98/03068 




31 kl) 



Fig. 5 



SUBSTITUTE SHEET (RULE 26) 



SDOCI0:<WO 9936085A1 I > 



.wo 98/36085 



6/11 



PCT/US98/03068 




BNSDOCID: <WO 9836085A1 I > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCT/US98/03008 



7/11 

Human Rice 

I 1 I ~i 

- 0 5 30 - 0 5 30 

60 kD mmm 

J|y||ym||. *^ *^ Complex 

^^ww' mm 

45 kD VVfffl 



31 kD 



Fig. 7 



N > E-D-P-Q-G-D-A-A-Q-K~T-D-T 

Fig. 8 

SUBSTITUTE SHEET (RULE 26) 



'>DOCID:<WO 98360e5Al I > 



. wo 98/36085 



rCTAJS98/03068 



8/11 



5^ ^ 



cj 0) i> ^5 



I 



V5 
O 



O O I 



V5 

2 

+ 



O 
o 

9 



I + 
CO CQ 



Aj QJ <U 
S » 

2 g a 



I + + + 



ft< -^J 




Fig. 9 



BNSDOCtD: <WO 9e36085A1 I > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCT/USy8/03068 



9/11 




Fig. 10 



DCX:iD;<WO 9836085At I > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/36085 



PCTAJS98/03068 



iO/ii 



flj 

CD o 

.3, 



0.35 1 
0.3 

0.25 
0.2 

0.15 
0.1 

0.05 
0 



I 



i 



ft 

P 



1 



1 



Ave. 0.124 



1^' 



^ ^ ^ ^ 



API06 cell lines 

Fig. IIA 



r 



c 



■^1 



0.35 

0.3 
0.25 
0.2 

0.15 
0.1- 

0.05 



0 



Ave. 0.038 



APIOl cell lines 



Fig. IIB 



BNS0OCI0:<WO 983608SAI I > 



SUBSTITUTE SHEET (RULE 2B) 



-WO 98/36085 



FCT/US98/03068 



11/11 




:DCX:iD: <W0 &836085A1 I > 



SUBSTITUTE SHEET (RULE 26) 



INTERNATIONAL SEARCH REPORT 



Ir /rnatlonol AppllcoHon No 

PCT/US 98/03068 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC 6 C12N15/82 C12N15/57 C12N15/15 C12N15/14 C12P21/02 



According to internaiionni Paten! ClassiftCQtionnPC) or to both nntional ctnssification and IPC 



B. FJELDS SEARCHED 



Minimum documentation sdarcMed (classlficaiion system tollowecf by classification symbols) 

IPC 6 C12N C12P 



Oocumentalion searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the internationai search (name of data base and, where practical, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category * Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



X 
Y 
Y 



WO 95 14099 A (RODRIGUEZ RAYMOND L ;UNIV 
CALIFORNIA (US)) 26 May 1995 
see the whole document 

WO 92 01042 A (NOVONORDISK AS) 23 January 
1992 

see page 6, line 15 - line 19 

JENSEN L G ET AL: "TRANSGENIC BARLEY 
EXPRESSING A PROTEIN-ENGENEERED, 
THERMOSTABLE (1,3-1,4 )-BETA-FLUCANASE 
DURING GERMINATION" 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 

SCIENCES OF USA, 

vol. 93, no. 8, April 1996, 

pages 3487-3491, XP002024710 

see the whole document 

-/-- 



1,4,6, 

8-18 

1.5 

1 



Further documents are listed in the continuation of box C. 



Patent family members are listed in annex. 



" Special categories of cited documents : 

"A" document dotining the general state of the art which is not 
considered to be of particular relevance 

"E" earlier document but published on or after the international 
filing date 

"L" document which may throw doubts on priority daim(s) or 
which is cited to establish the publication dale of arrather 
citation or other special reason (as specified) 

"O" document refernng to an oral disclosure, use, exhibition or 
other means 

"P" document published prior to the international filing dale but 

later than iho pnority date claimed 



"T" later document published after the international filing date 
or priority date and not in conflict with the application but 
cited io understand the principle or theory underlying the 
invention 

'X" document of particular relevance: the claimed invention 
carTnot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

"Y" document of particular relevance: the claimed invention 

cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

"&** document member of the same patent family 



Date of the actual completion of Iheinternationai search 



30 June 1998 



Date of mailing of the international search report 



14/07/1998 



Name and mailing address ot the ISA 

European Patent Office, P.B. 5818 Patenllaan 2 
NL ' 2280 HV Rijswijk 
Tel. (-t-31-70) 340-2040. Tx. 31 651 epo nl. 
Fax: (+31-70) 340-3016 



Authorized officer 



Maddox, A 



Fofm PCT/ISA/210 (second sheet) (July 1992) 
BIMSDOCID:<WO 98360B5A1 I > 



page 1 of 2 



INTERNATIONAL SEARCH REPORT 



1r> .national Application No 

PCT/US 98/03068 



C.(Contlnuatlon) DOCUMENTS CONSIDERED TO BB RELEVANT 



Category 


Citation ol document, with indlcation.where appropnate. ot the relevant passages 


Relevant to ciaim No. 


P.X 


TERASHIMA M ET AL: "Production of 
functional human alpha^l^antl trypsin by 
rice cell culture; express ion and protein 
secretion in callus culture (conference 
abstract )" 

ABSTR.PAP.AM.CHEM.SOC. ;(1997) 214 MEET., 
PT.l, AGFD018 CODEN: ACSRAL ISSN: 
0065-7727 AMERICAN CHEMICAL SOCIETY, 214TH 
ACS NATIONAL MEETING, LAS VEGAS, NV, 7-11 
SEPTEMBER. 1997.. XP002069835 
see abstract 018 


1,4, 

8-10, 

14-17 


A 


THOMAS, 8. R. ET AL: "Gene regulation and 
protein secretion from plant cell 
cultures: the rice alpha - amylase system" 
ADVANCES IN PLANT BIOTECHNOLOGY, (1994) 
PP. 37-55. STUDIES IN PLANT SCIENCE 4. 85 
REF. PUBLISHER: ELSEVIER SCIENCE. 
AMSTERDAM ISBN: 0-444-89939-1, XP002069833 
see the whole document 


1.11 


A 


CHAN M-T ET AL: "Novel gene expression 
system for plant cells based on induction 
of alpha-amylase promoter by carbohydrate 
starvation . " 

JOURNAL OF BIOLOGICAL CHEMISTRY 269 (26). 
1994. 17635-17641. ISSN: 0021-9258. 
XP002069821 

see the whole document 


1,11 


A 


US 5 460 952 A (YU SU-MAY ET AL) 24 

October 1995 

see the whole document 


1.11 


A 


WO 90 01551 A (ROGERS JOHN C) 22 February 
1990 

see the whole document 


1,12 


A 


WO 91 02066 A (MOGEN INT) 21 February 1991 
see the whole document 


1 


A 


EP 0 348 348 A (CIBA GEIGY AG) 27 December 
1989 

see examples 41-45 


1 



Foim PCT/iSA/210 (coniinuaiion ol second shaei) (July 1992) 



page 2 of 2 



<WO 9836085A1 I > 



INTERNATIONAL SEARCH REPORT 

Inrormotlon on patent romily mennborB 



Intv .atlonal AppHcatlon No 

PCT/US 98/03068 





Publication 




PAtPnt (nmilu 

1 u\dii lUiiniy 




Pi iKIioittinn 
Puuiit.clilon 


cited in search report 


date 




member(s) 




date 


WO 9514099 A 


26-05-1995 


US 


5693506 


A 


02-12-1997 






AU 


1289295 


A 


06-06-1995 






CA 


2176834 


A 


26-05-1995 






EP 


0788550 


A 


13-08-1997 






JP 


9509565 


T 


30-09-1997 



WO 9201042 


A 


23-01-1992 


AU 


8219291 A 


04-02-1992 


US 5460952 


A 


24-10-1995 


JP 


7143895 A 


06-06-1995 








US 


5712112 A 


27-01-1998 


WO 9001551 


A 


22-02-1990 


AU 


638409 B 


01-07-1993 








AU 


4037289 A 


05-03-1990 








EP 


0428572 A 


29-05-1991 








JP 


4500153 T 


16-01-1992 








US 


5677474 A 


14-10-1997 



WO 9102066 A 21-02-1991 NL 8901932 A 18-02-1991 

EP 0436003 A 10-07-1991 

JP 4502861 T 28-05-1992 

US 5650307 A 22-07-1997 

US 5716802 A 10-02-1998 

US 5753748 A 09-06-1998 



EP 0348348 


A 


27-12-1989 


AU 


631551 


B 


03-12-1992 








AU 


3656889 


A 


21-12-1989 








DK 


302289 


A 


28-02-1990 








IL 


90640 


A 


04-01-1998 








JP 


2046238 


A 


15-02-1990 



fom PCT/lSA/210 (patont lamiiy annex) (July Id92) 
BNSDOCIO: <W0 9836085A1 I > 



